On Tue, 5 Dec 1995 bruno@chemcrys.cam.ac.uk wrote: [...]
possible to conceive an automatic method of generating hypertext documents from 'straight' text. I have also had the dubious honour of attempting to convert a Masters dissertation into hypertext (as part of one of the aforementioned studies). It isn't always easy especially if you are not allowed to rearrange the text in any way.
There are two distinct issues here: markup and hypertext. I interpret hypertext to mean a document whose structure is enhanced (hopefully) by the addition of links from one part of the document to another. These links are additional to the normal structuring tools we learn when we read conventional books - so that an index is not regarded as hypertext, though logically it is. Like Henry I agree that Bush is normally credited with hypertext, though I think that Diderot and the encyclopaedists are one of the major epics in the globalisation of knowledge. The recent use of the term has stressed this globalisation - i.e. that the hyperdocument can be distributed over many physical entities. If HTML only contained links (<A HREF=to>, <A NAME=from>) it would be pure hypertext, but it contains some minimal markup as well. Apart from the formatting, HTML 2.0 marks up: data containers (UL and OL) document structuring (H1...H6) They didn't get this right! TITLE and ADDRESS IMG This markup defines these elements (sic) as having a particular role in the document, and it is legitimate to use them for searching, indexing, restructuring, etc - though this is rarely possible with the present diversity of authoring tools.
One of the ideas of hypertext is that the concept of a page is done away with. I still tend to view hypertext documents in terms of pages; I wonder if others do.
The terminology is common and IMO quite useful. However there is a big contrast between the supporters of what I call CONTENT and FORM. Form is (at present) the most highly desired - Can X send Y a 'page' that looks exactly how X wants it. This is where CENTER, BLINK, etc raise such passions. Almost all discussion on the HTML-WG is about form. It is content that concerns me more. In chemistry I believe it matters critically that information is tramsmitted accurately, and that its display is (relatively) less important. HTML is very forgiving about variations in syntax, so, for example: Please send <CURRENCY COUNTRY=CANADA>10 dollars will be rendered (without comment) by all browsers simply by omitting the tag: Please send 10 dollars. This will not do for chemistry! I have addressed this in Chemical Markup Language (CML) which is now at: http://www.dl.ac.uk/CBMT/cml/ CML concentrates on information structure and content and very little on form. I shall be adding more discussion at that site of the flavour of this posting. If you are converting *.txt to *.html you need to ask yourself WHY? If it's simply for formatting so that it's nicer to look at when downloaded, then a trivial tool will do. If, however, you want an INDEX or other markup and the author hasn't included that, it's an expensive operation and there are not many shortcuts. If you have the source (e.g. LaTeX or Word, there are tools to convert to semantically void HTML).
I often find something on the WWW that I want to print off and read away from the computer. Sometimes this isn't easy, particularly if a document is spread across more than one HTML file (as sometimes happens with papers presented at electronic conferences).
I agree. When I want people to download something (as for Chemical Markup Language) I include a *.tar.gz for the appropriate part of the distribution.
These issues possibly apply more to resources such as journals than they do to some other applications, but:
They apply across the board! It's a culture change that we have to make It will take at least half a generation.
How 'ready' is the scientific community to change the way it approaches 'written' texts?
This depends (IMO) on our education. Books (as opposed to scrolls, clay tablets) have been in common use for ca. 500 years and a large part of our education is given to teaching people how to use them. Teenagers are now much more familiar with electronic metaphors (through keyboards, screens, etc). Mine read much less paper than we used to. They love the WWW.
How prepared is the scientific community to glean information from a computer screen and not worry about having a hard copy?
It depends on what they want to do with it. There are still several things we can't do on screen (annotation is one, reading in the bath another). But who uses the CSD printed books for searches if they have (free) access to an on line version?
To what extent are these barriers to the promotion of WWW resources?
There are many things that may/will happen outside our community (better screens, new metaphors), but WE must concentrate urgently on getting our discipline-specific information in order. This will take 10-20 years.
Are there issues relating to the design of HTML documents that we need to consider in relation to this, be it in the conversion of existing 'straight' texts or in the design of HTML documents from scratch?
Yes. At present badly thought out hypertext is a nightmare. IMO it works best when hypertext maps well onto convential paper structures. I have tried to come up with some archetypes and have genralised this to four: - serial book (e.g. detective novel). read from page1 to page 200 - dictionary (phone book, CSD, Swissprot). Locate a precise chunk of information by (alphabetical) index - tree (technical manual e.g. brakes, engine, lights can all be read independently) - anthology (literature, or journal, where items are distinct but have a common theme) The electronic era has added the 'grep' - i.e. searching unstructured documents. I'd be very interested to know ehether other people have additions to this list. Are there any new ones which have arisen over the last year or two? I've just thought of a fifth: the map. P. BTW Chemical Markup Language is now in a reasonable state to look at. There is a browser which can be compiled under UNIX (and we are trying for PC and Mac). There are many examples, including molecular data files and I am writing extensive documentation. Feedback will be most valuable. Peter Murray-Rust, Glaxo Research & Dev. (pmr1716@ggr.co.uk); (BioMOO: PeterMR) Birkbeck College, ubcg09q@cryst.bbk.ac.uk, CBMT/Daresbury mbglx@seqnet.dl.ac.uk http://www.cryst.bbk.ac.uk/PPS/index.html, http://www.dl.ac.uk/CBMT/HOME.html ----- chemweb: A list for Chemical Applications of the Internet. To unsubscribe, send to listserver@ic.ac.uk the following message; unsubscribe chemweb List coordinator, Henry Rzepa (rzepa@ic.ac.uk)