Dear Netters in relation to Henry Rzepa's interesting post, I have more questions than answers. Why should we explicitly exclude XML and CML? Surely if they have a role to play it is a mistake to try and cover the same ground in HTML 4? Is HTML 4 coming from a different set of concerns than is served by XML? How many Chemistry Robots are there which can process Henry's SMILES string or the HTML 4 form of the viagra molecule? There is a difference between chemistry on the page (explicitly encoded in HTML) and chemistry available through the www (in databases which may be accessed through the www) -- a suitably designed robot could rely on the search capabilities of the databases it was pointed at. Should a chemistry robot trawl for chemically interesting HTML or offer multiple searches of www sites with publicly available databases? Or both? If both, there is likely to be a trade-off between consistency and completeness. Well it wont be the first time. Seems as though it would be a good idea if HTML evolves to support some explicit tagging of Molecular Structures (cf the provision of key terms for abstracts with conventional publications). But what about the differences/non-equivalence between different encoding systems? Does it matter? ChemSymphony does have ways of allowing the author/publisher to decide whether or not and how much presentational autonomy to give the user. In effect the publisher can give the user a control applet which allows the user to over-ride the default html. But there is a big difference between Java applets and programs which require installation of helper applications on the browser side. The publisher who writes HTML 4 is going to invoke his applets with different code (using the Object tags), but it is pretty clear how to use the 'object' tags in place of the applet tags and the users will then get the benefit of HTML4. See http://www.w3.org/TR/WD-html40-970708/appendix/changes.html Admitted, there may be more of a problem for the client who uses plug-ins or applications to read html. On the issue of 'backwards compatibility': the browsers which support HTML4 will have to be backwards compatible -- otherwise they will never win acceptance. We should avoid 'applet' and 'font' tags when we start producing HTML 4 but surely the browsers will support these 'deprecated' tags for a long time to come. Provided they can handle Java applets they should be able to handle Chemistry pages which use ChemSymphony. Users may not get all the benefits of HTML4 in the way that HTML4 provides it, but maybe they will get enought of what they need. Adam --- --- Adam Hodgkin | e-mail: adam@cherwell.com Chairman | Phone: +44 (0)1865 784810 Cherwell Scientific Publishing | Fax: +44 (0)1865 784801 Oxford OX4 4GA, UK | URL: http://www.cherwell.com --- --- chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
In reply to Adam Hodgkin,
Why should we explicitly exclude XML and CML? Surely if they have a role to play it is a mistake to try and cover the same ground in HTML 4? Is HTML 4 coming from a different set of concerns than is served by XML?
I did not mean to imply that. What I meant to say was that HTML and XML/CML throw up slightly different issues, and in an over-long email, I wanted to leave XML to a separate posting. Relating to HTML however, there do appear to be some 2 million "chemical" documents out there in that form, but only about 0.5% of that number of "high value" legacy molecular descriptors such as molfiles etc. One presumes that 99.5% of these chemical documents either contain little "marked-up" chemistry or its all disappeared into "unindexable" bit map images. If we managed to replace those 2 million HTML documents and any images they point to with equivalent XML/CML, then the value could go up enormously. For that to happen, the community will have to develop a lot of high quality low cost tools, and the tangible benefits will have to be ardently sold to everyone.
There is a difference between chemistry on the page (explicitly encoded in HTML) and chemistry available through the www (in databases which may be accessed through the www) -- a suitably designed robot could rely on the search capabilities of the databases it was pointed at. Should a chemistry robot trawl for chemically interesting HTML or offer multiple searches of www sites with publicly available databases? Or both? If both, there is likely to be a trade-off between consistency and completeness. Well it wont be the first time.
This is an interesting key point. The model followed for the last 100 years is for one or two organisations to collect chemistry on our behalf, consolidate it behind comprehensive, high quality, but largely closed databases, and then sell the "added value" so accrued back to us. Thus accessing either CAS or Beilstein is not cheap. The cost has historically come from the need to employ literally thousands of human abstracters retrieving chemical information in a largely manual manner, from paper sources. Nowadays, its still the case that almost all "collections" of molecules are hidden behind a variety of custom databases. Some, such as ChemFinder, are nevertheless freely available, but the really large ones are only available if you are well-resourced. Thus the issue of whether HTML or XML/CML should carry parsable chemical information, or whether most of that information should be closed from view in a proprietary database, must surely be one of the key debates of the next few years. There is no doubt for example that journals are rapidly evolving into large, and mostly proprietary databases. The tagged chemical content of these databases is still relatively low, but in one scenario, a few very large publishers might well control almost all access to such databases, and one might imagine in that scenario, that as the "quality" of the chemical access goes up, so will the costs. There might also be a "third way" (a term currently popular in British politics!). Working with XML/CML teaches us the value of molecular components, ie well defined chemical objects. Such object collections clearly needs quality control mechanisms, and a means to create a distributed decentralised system. Whilst its possible to envisage the Web as providing that infra-structure, we found it interesting investigating the "object store" as "third way", and the possibilities of creating a globally distributed "chemical object store". If you want an (approximate) analogy, ponder the amazing infra-structure that has been built up on your behalf to direct you in about 1-2 seconds to the resource behind the simple URL, ie the distributed Domain name server system (DNS). Our experiments in "chemical object stores" or "COS" are to be found at http://www.ch.ic.ac.uk/vchemlab/cos/ Dr Henry Rzepa, Dept. Chemistry, Imperial College, LONDON SW7 2AY; mailto:rzepa@ic.ac.uk; Tel (44) 171 594 5774; Fax: (44) 171 594 5804. URL: http://www.ch.ic.ac.uk/rzepa/ chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
At 09:42 AM 27/5/98 +0100, Rzepa, Henry wrote:
In reply to Adam Hodgkin,
Why should we explicitly exclude XML and CML? Surely if they have a role to play it is a mistake to try and cover the same ground in HTML 4? Is HTML 4 coming from a different set of concerns than is served by XML?
I have just come back from XML98 in Paris where Jean Paoli from Microsoft and Bernard Feinman from Netscape gave plenary presentations on their support for XML. They clearly see HTML and XML co-existing. One of the highlights was that Microsoft showed their Office suite using encapsulated XML. For example, JP showed a multi-variate table in Excel held *as XML*. It would be trivial to export this and put it into (say) a multivariate statistical package or draw a multi-dimensional data view for navigation. With RTF and HTML tables this is simply impossible. MS highlighted the role of information components. They clearly see each discipline as providing their own specific tools for managing domain-specific information and they specifically mentioned MathML and CML as the way that they saw things going. We can take it as axiomatic that the easiest way for managing technical information over the WWW from now on will be using XML-based components. These will, if defined according to the Namespace proposal, interoperate with each other and will be easy to use in the next generation of browsers. I am continuing to refine the way that CML is defined and how it interoperates with other XML components. Later this year we should have a clear specification for XLink (the hypermedia spec) and this will allow many exciting things to be done generically (e.g. assignment of peaks to functional groups, descriptions of reaction mechanisms, highlighting active sites and annotation against sequence). The main thing that is as yet undecided is how to manage simple data types (Integer, String, Float, etc.). Since these are very important to chemistry, they underpin CML. (Coordinates, melting point, molecular weight, etc. all require such description). It is likely that I shall announce an interim version shortly which will be designed to be refined in the face of further XML developments. CML is a starting point for the molecular community - not a finished product. Anyone who has looked at the W3C processes will see how they define a set of goals and how there is a an iterative, but tightly controlled process, for proceeding. Henry and I are actively planning how to support this next phase. You will see it on this list first :-) P. BTW Netscape and MS both stressed the importance of open processes. They highlighted the roots of XML in free movement of text on the web. MS said 'data should be free' and NS said 'code should be free'. I agree with both these sentiments. It is clear that XML does not reduce the opportunity for competition among vendors - but it raises the level of competition above fighting by using syntactic incompatibility. We need to move in the same direction. chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
Glyn Moody has written an interesting and informative article about XML for the New Scientist: http://www.newscientist.com/ns/980530/xml.html which spotlights CML among other markup languages. It is factually correct (not always the case with XML articles :-( )and highlights the challenge of providing universal semantics and metadata. One of the concerns (Mark Pesce, VRML) is that XML could lead to 'Balkanisation' of the Web, through designers sticking to their own tags. I take the opposite view - there could hardly be anything much worse than the current syntactic and semantic chaos in chemical informatics. One area of particular interest is schemata for metadata. Ora Lassila (Nokia) notes that "it takes a long time to get enough representatives from any community to agree on anything" and "expects that rough and ready schemata will become standards by default simply because people will start using them". There are many quotes from supporters, ending with P.G.Bartlett (ArborText): "XML will provde to be one of the top ten technological innovations of the first century of computing". I'll buy that. P. chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
participants (3)
- 
                
                Adam Hodgkin
- 
                
                Peter Murray-Rust
- 
                
                Rzepa, Henry