This is being sent to CHMINF-L, CHEMWEB, and CHEMIND-L. (It's long.) ------------> Data Needs of Academic Research on the Internet Gary Wiggins Indiana University Chemistry Library wiggins@indiana.edu Data on the Web "All in all, the chemical data now available on the web is in a different class from the data found in refereed journals, critical reviews and books from reputable publishers. - David Lide (CHMINF-L, 30 October 1996) The above response was one of several received in response to questions sent to three chemically-oriented discussion lists in the fall of 1996. This was in preparation for a lecture and demonstration delivered at the National Institute of Standards and Technology on December 4, 1996. Most of the information in this paper was included in that presentation. Questions were sent to CHMINF-L, CHEMWEB, & CHEMIND-L in late October 1996. They were designed to: - Gauge the extent of inaccurate data in Web databases - Define the characteristics of data on the Web >> Sources of data >> Need for standardization of data formats - Determine the best guides to data. Respondents to the survey noted these problems with the accuracy of data on the Web: - Units are frequently omitted - Transcription errors are often encountered - This leads to a need to find redundant data - Very few sources have quality assurance statements - Few of the Web data sites give the source of the data - If they do, data are likely to be copied from outdated sources. Other Survey Results Several people commented on efforts or practices that will likely improve the quality of data on the Internet, including: - Standardization efforts: >> CLIC, Chemical MIME, CML >> Roles for IUPAC, CODATA: certification? (One person, however, questioned whether standardization efforts were worthwhile.) - Efforts to share data or to cooperatively compile data sources >> Open Molecule Foundation >> Molecule of the Month >> Reciprocal Net >> Structure and Reactivity Across the Periodic Table - Provision of a minimal level of auxiliary information (metadata) >> authorship >> units >> conditions of measurement >> references to primary and secondary sources of data - Use of standard symbols and terminology - Guidelines on how to handle special characters. General Comments on Data on the Web "While some might argue that the Internet is designed to make information in a single location accessible to users around the world, the large number of mirrored sites already in existence points out the Net's inadequacy." - Byte, December 1996 There are a number of steps needed to improve the quality of data found on the Web. Among them are: - Mechanisms to synchronize changes made at multiple sites - Faster access to resources - More secure transactions - Progress on chemical metadata standards - Interoperability of chemical plug-in programs. Some Goals for Improving Data on the Web - Assemble the most reliable data available - Arrange data for easy retrieval - Provide a "SuperIndex" of available data sources - Establish criteria for evaluation of data sources: >> descriptions of physical theories on which data are based >> full references to literature >> format of the database >> search capabilities How to Find Data Now A second part of the NIST presentation was a look at how to find data on the Web today. One person pointed me toward Alexander Lebedev's "Best Search Engines for Finding Scientific Information in the Web" (http://www.chem.msu.su/eng/comparison.html). He searched 11 Web search engines and concluded: - Excite retrieves a comparable number of documents to Altavista - Metacrawler is the most powerful search engine for SATI - Two of the search engines are not being updated. Lebedev also compared the Web searches to INSPEC results for 1994 & 1995 on the same topics. He found: - Only 5-10 % of relevant information is on the net - The Web is particularly good for supplemental information: >> on authors >> on their work and research projects >> on foundations supporting them. Besides using search engines, these are some other ways to find data using the Internet: - Submit the question to a knowledgeable source - Consult lists of sources (guides) - Try known sources - Try comprehensive chemistry guides. Lists of Sources (Guides) CIS-IU (Chemical Information Sources from Indiana University) http://www.indiana.edu/~cheminfo/ca_accc.html http://www.indiana.edu/~cheminfo/ca_ppi.html Databases for Atomic and Plasma Physics http://plasma-gate.weizmann.ac.il/DBfAPP.html IOP's Software and Data Page http://www.iop.org/Physics/Resources/phsoft.html Known Sources NIST Physics Laboratory http://physics.nist.gov/PhysRefData/contents.html Sheffield ChemPuter http://www.shef.ac.uk/~chem/chemputer/ Biocatalysis/Biodegradation Database http://dragon.labmed.umn.edu/~lynda/index.html Comprehensive Chemistry Guides Chemfinder http://chemfinder.camsoft.com WWW Chemical Structures Database http://schiele.organik.uni-erlangen.de/services/webmol.html SpaceCrunch http://www.tripos.com/spacecrunch/ Other Examples University of Texas's ThermoDex http://www.lib.utexas.edu/Libs/Chem/info/thermodex/ Table of the Properties of 200 Linear Macromolecules and Small Molecules http://funnelweb.utcc.utk.edu/~athas/databank/intro.html Chemical errors found on WWW sites; A discussion of problems encountered while creating the ChemFinder WebServer database http://www.camsoft.com/chemfinder/errorsfound.html Internet Demos at NIST CIS-IU ca_accc.html Go to Anal Chem page, then to MS Links at SIS, then Dave's Math Tables www.sisweb.com/math/tables.htm NMR Information Server at U of Florida micro.ifas.ufl.edu/ playing Happy Birthday to You on an NMR Spectrometer Dababase of Core-Edge (Inner-Shell) Excitation Spectra of Gas Phase Atoms and Molecules xray.uu.se/hypertext/corexdb.html SEARCH naphthalene Spin trap Data Base alfred.niehs.nih.gov/LMB/stdb ENTER THE DATABASE doesn't work, but HIPPO does Electron Paramagnetic Resonance at Bristol emrs.chm.bris.ac.uk/ Beautiful background! In "About the Database" in the Introduction, Spectra examples, Show the example Cu(II) (nothing else works!) Look at IU Molecular Structure Center's Reciprocal Net www.cica.indiana.edu/~recip/ www.indiana.edu/ReciprocalNet.html Molecules R Us molbio.info.nih.gov/cgi-bin/pdb Search dehalogenase (E.C.3.8.1.5) NIST Chemistry WebBook webbook.nist.gov/chemistry Look for 91-56-5 AIRSITE ozone.sph.unc.edu Has "Environmental Data, but it's "under construction" THERMODEX www.lib.utexas.edu/Libs/Chem/info/thermodex/ Search Gibbs Free Energy and organic Chemfinder chemfinder.camsoft.com Search MEK WWW Chemical Structures Database schiele.organik.uni-erlangen.de/services/webmol.html Search MEK, then 2-butanone SpaceCrunch www.tripos.com/spacecrunch/ Molecule of the Month www.bris.ac.uk/MOTM/motm.html ----- chemweb: A list for Chemical Applications of the Internet. Archived as: http://www.ch.ic.ac.uk/hypermail/chemweb/ To unsubscribe, send to listserver@ic.ac.uk the following message; unsubscribe chemweb List coordinator, Henry Rzepa (rzepa@ic.ac.uk)