Re: www.google.com Searches & Royal Society of Chemistry (fwd)
Are the RSC allowing Web-spiders into the site and are major updates submitted to the web-search engines so as to ensure a renewed grab by the spider? Also do the titles of the webpages repeat the name of the organisation as well as the topic of the webpage to enforce the association(?0 If not - it is not surprising that little comes up on the RSC(?).
From Month to month - spiders can be 20 to 75% of downloads to the academic domain I would after. Which is good - this means (in theory) the info can be easy to find on the search engines. http://www.ccp14.ac.uk (At least for the tests I have tried)
However: Be wary though that the search engines that search search engines can be perturbed. For instance, if you only submit to Alta-Vista - this is only one search engine that gives the hit that when averaged out - may hide in the noise despite it being spot on with the keywords. Due to recently releasing some Oral talk notes on the web - I have had a chance to test this out. One of the talks being about Phase Identification using Powder X-ray Diffraction. A quite specialised topic. This was submitted to Altavista and searching there on Phase Identification did gives it the top of the list http://www.altavista.com/ (however today - it looks like it is hidden in the noise so these things could be very time dependent on how recent a spider crawl was done or other parameters inside the systems) But try the normally excellent search engines like www.infind.com (when it was top of the list at Altavista) gave nothing on the hit. --- Not sure if clear conclusions can be given but the relevance of hits with search engines seem to fluctuate. I am wondering if they are updating the indexing enough or additional information puts too much noise into the system that takes a while to clear up? in the case of www.infind.com - it used to be top notch for "relevant" hits - but seems to be fluctuating lately such that www.google.com is presently doing a better general job. of course this may vary depending on the type of search being requested. Lachlan.
Hello Chemweb( Com & Imperial varieties), I liked the information from Adam Hodgkin and used Google to demonstrate search engines to a student - the results were quite nice ( some peculiar results found - as usual- but easily sorted )- but when my own "name" and "chemistry" were input I was surprised to find so little reference to the Royal Society of Chemistry ( my top hits were from ChemWeb.com and the Chemical Structure Association - full marks to them !!). Perhaps the RSC would like to comment - they are THE Professional organisation for chemists - particularly in view of adverse comments (not from me) on other aspects of their efforts , previously circulated via this list. Perhaps they or the Royal Society should sponsor a chemistry (science) biased search engine we could all be happy with. On a related matter, I recently received an information leaflet about the RSC's Library & Information Centre. I liked the idea of using their CD-Roms and possibly doing on-line searches myself(preferably using CAS Sci-Finder) but was told access is only available from London. This is a poor service for the majority of members not located near London, it explains why I have never made use of this facility in 30 years of active chemistry, and with modern IT there is very little excuse for it. I will forward this email( & FAX a copy to the editor of Chemistry in Britain just in case) perhaps he might include it in the "Comments feature" of Chemistry in Britain for those members not served by ChemWeb list. Bernard.
On Wed, 7 Apr 1999 11:38:55 +0100 "Rzepa, Henry" <h.rzepa@ic.ac.uk> wrote:
From: "Adam Hodgkin" <adamgh@dial.pipex.com>
Hello Chemweb
There is a very interesting new search engine on the block. Try www.google.com
I tested it with some obvious search terms (ATP, Sodium Chloride, Wendy Warr, Henry Rzepa, MDL, ChemWeb etc) and found useful results.
Particularly handy is the ability to disambiguate (ATP collects lots of Advanced Technology Programs etc -- but you clear the fog by simply adding 'chemistry' to your search term); AND the coolest feature you can immediately click on red bars which show the extent to which the page is cited (hyperlinked from other pages in the google repository). This is a useful way of ranking importance/relevance/webcentrality.
The other features of the system are well-explained at the site; but I add that the designers have put in some very cool and simple design features which they dont brag about. They style it a 'beta' version, but I am begining to use it in preference to AltaVista and Excite etc
I think it will have good special applications for chemistry.
Adam
Dr Henry Rzepa, Dept. Chemistry, Imperial College, LONDON SW7 2AY; mailto:rzepa@ic.ac.uk; Tel (44) 171 594 5774; Fax: (44) 171 594 5804. URL: http://www.ch.ic.ac.uk/rzepa/
chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
---------------------- Dr. B. Blessington Pharmaceutical Chemistry Dept., Bradford University. Bradford BD7 1DP. U.K.
email: b.blessington@bradford.ac.uk tel: 44 (0) 1274 234704 WWW: http://www.student.brad.ac.uk/bblessin/
chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
-- Lachlan M. D. Cranswick Collaborative Computational Project No 14 (CCP14) for Single Crystal and Powder Diffraction Daresbury Laboratory, Warrington, WA4 4AD U.K Tel: +44-1925-603703 Fax: +44-1925-603124 E-mail: l.cranswick@dl.ac.uk Ext: 3703 Room C14 NEW CCP14 Web Domain (Under heavy construction): http://www.ccp14.ac.uk chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
Are the RSC allowing Web-spiders into the site and are major updates submitted to the web-search engines so as to ensure a renewed grab by the spider? Also do the titles of the webpages repeat the name of the organisation as well as the topic of the webpage to enforce the association(?0 If not - it is not surprising that little comes up on the RSC(?).
From Month to month - spiders can be 20 to 75% of downloads to the academic domain I would after. Which is good - this means (in theory) the info can be easy to find on the search engines. http://www.ccp14.ac.uk
On this theme, we have written a little program called "meta-hunter", which seeks out metadata in whatever form from remote web sites. Arguably, since meta-data is often given a weight factor 10-20 times that of "body text", carefully selected metadata can significantly improve the quality of search results. Alta vista is one search engine that clearly indicates that meta-data is indexed by them. Our trawl of a number of carefully selected sites, including the RSC, reveals that predominantly the only meta-data most sites have is the "generator" field, which tells the company that wrote the authoring software used to prepare the site that their product was used. At most what one also gets is "description|" and "keywords". We are urging people to use the Dublin Core variant of meta-data. I would like to propose that one element of this scheme, namely <META NAME="DC.Type" CONTENT="chemical"> be used to identify that the page is "predominantly" chemical in its content. Since such a field can give say a weight of 10 or 20 to the term "chemical", any subsequent search of an index using X and chemical as the search term would return only pages which have genuine chemistry in their content. To see how Dublin Core works, go to http://www.ukoln.ac.uk/metadata/dcdot/ If you point it at our web site for example, http://www.ch.ic.ac.uk/ you will see the metadata we have entered, at least on the root document of our site. We will release the metahunter program in the near future (at the moment its a Java appplet, but rather hungry in its memory requirements, which we are working at reducing). Finally, assuming that the chemical community agrees on at least some common meta-data declarations, the stage would be set for genuine chemical indices of high value content. Still, I remain cynical that the community will ever be happy with a "single" chemistry portal into such content. Maybe I should also suggest that we all move to XML. Once that is done, the world will change again! You ain't seen nothing yet! Watch this space for some interesting stuff in this area!! Dr Henry Rzepa, Dept. Chemistry, Imperial College, LONDON SW7 2AY; mailto:rzepa@ic.ac.uk; Tel (44) 171 594 5774; Fax: (44) 171 594 5804. URL: http://www.ch.ic.ac.uk/rzepa/ chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
On Thu, 15 Apr 1999, Rzepa, Henry wrote: [...]
We are urging people to use the Dublin Core variant of meta-data. I would like to propose that one element of this scheme, namely
<META NAME="DC.Type" CONTENT="chemical">
be used to identify that the page is "predominantly" chemical in its content. [...]
How standardized is this type? As far as I understand the working drafts on the DublinCore websites http://purl.org/dc/documents/working_drafts/wd-typelist.htm http://www.agcrc.csiro.au/projects/3018CO/metadata/dc_tf/type_simple.html the recommended types are collection, dataset, event, image, interactive resource, physical object, service, software, sound, and text. There also are proposals for something more specific like "text.thesis.doctoral" (http://sunsite.berkeley.edu/Metadata/structuralist.html). Where can I find out more about the DC.type "chemical"? Thanks for any hints, Hans Benedict -- Hans Benedict Chemie.DE Information Service mailto:benedict@chemie.de FU Berlin, Fachbereich Chemie Fon: +49-(0)30-838-3408 Takustrasse 6, D-14195 Berlin, Germany Fax: +40-(0)30-838-3464 http://www.chemie.de/ chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
Maybe I should also suggest that we all move to XML. Once that is done, the world will change again! You ain't seen nothing yet! Watch this space for some interesting stuff in this area!!
Dr Henry Rzepa, Dept. Chemistry, Imperial College, LONDON SW7 2AY;
As a foolowup to Henry's comment about XML, I know little about XML personally but readers may find the following useful and it might be useful to change a few bad habits when writing HTML that we all have (well, that I have anyway). HTML is looking to be redefined in terms of XML. See: http://www.w3.org/MarkUp/#future http://www.w3.org/MarkUp/Activity.html and in particular http://www.w3.org/TR/WD-html-in-xml/ This should not cause panic but in order to make the transition easier there are some things that you can start doing *now*. In particular change to lower case for *all* tags and attributes. So <h1> not <H1>. Always terminate elements: so <p>....</p> (you must include </p>. Always use </li> at end of <li> elements. Always use quotes for attributes: so height="20" not height=20. To quote a few key points from the 3rd URL: "Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in SGML-based HTML 4.0 [HTML] must be changed. 4.1 New Requirements 4.1.1 Documents must be well-formed. Well-formedness is a new concept introduced by [XML]. Essentially this means that all elements must either have closing tags or be written in a special form (as described below), and that all the elements must nest. Although overlapping is illegal in SGML, it was widely tolerated in SGML-based browsers. CORRECT: nested elements. <p>here is an emphasized <em>paragraph</em>.</p> INCORRECT: overlapping elements <p>here is an emphasized <em>paragraph.</p></em> 4.1.2 Element and attribute names must be in lower case. XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XML is case-sensitive e.g. <li> and <LI> are considered to be different tags. 4.1.3 For non-empty elements, end tags are required. In SGML-based HTML 4.0 certain elements were permitted to omit the end tag; with the elements that followed implying closure. This omission is not permitted in XML-based XHTML. All elements other than those declared in the DTD as EMPTY must have an end tag. CORRECT: terminated elements <p>here is a paragraph.</p><p>here is another paragraph.</p> INCORRECT: unterminated elements <p>here is a paragraph.<p>here is another paragraph. 4.1.4 Attribute values must always be quoted. All attribute values must be quoted, even those which appear to be numeric. CORRECT: quoted attribute values <table rows="3"> INCORRECT: unquoted attribute values <table rows=3> 4.1.5 Attribute Minimization XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as compact and checked cannot occur in elements without their value being specified. CORRECT: unminimized attributes <dl compact="compact"> INCORRECT: minimized attributes <dl compact> 4.1.6 Empty Elements Empty elements must end with />. For instance, <br /> or <hr />. CORRECT: terminated empty tags <br /><hr /> INCORRECT: unterminated empty tags <br><hr> Dr Mark J Winter (Director of Studies) Department of Chemistry, The University, Sheffield S3 7HF, England tel: +44 (0)114 222 9304 fax: +44 (0)114 222 9303 e-m: mark.winter@sheffield.ac.uk http://www.shef.ac.uk/chemistry/staff/mjw/mark-winter.html WebElements is the periodic table on the world-wide web: http://www.shef.ac.uk/chemistry/web-elements/ The Sheffield Chemdex is a listing of chemistry sites on the world-wide web: http://www.shef.ac.uk/chemistry/chemdex/ chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
participants (4)
- 
                
                Hans Benedict
- 
                
                L.M.D.Cranswick@dl.ac.uk
- 
                
                Mark Winter
- 
                
                Rzepa, Henry