Databases in the Field: A Dissenting View

B.R. Julian, U.S. Geological Survey, Menlo Park
G.R. Foulger, University of Durham, UK

Introduction

PASSCAL is currently devoting a major effort to developing PDB (PASSCAL Data Base), a collection of software to manage data from seismic field projects and to convert them to SEED format for archiving at the IRIS DMC. Given the requirement to supply data to the DMC in this format, it seems that it will become effectively mandatory to use PDB in all PASSCAL- supported experiments in future.

On the basis of extensive field experience, we think this effort is misguided. PDB is far too complicated for use under field conditions, a situation that will persist even if, at some future date, it is free of bugs and fully documented. The complex organization of the DMC is best suited for data from permanent and stable seismograph networks, and not for a forever-changing rag-bag of short-term field experiments conducted for different purposes by different people with different levels of expertise. Imposing the use of such a complicated system upon field projects jeopardizes their primary goal, which is collecting scientifically useful data. Archiving data for possible future use is important, but it can and should be done without threatening the primary objectives of field experiments in the name of facilitating later access to data by armchair seismologists unconnected with the original experiments.

The Iceland Hotspot Experiment

Our opinions are based on personal exposure to PDB. We are currently operating a 35-station network of PASSCAL broadband instruments in Iceland for a period of about two years. The network was installed in the summer of 1996 by a group that included three PIs, a post-doctoral research assistant, two postgraduate research assistants and two Ph.D. students. The instruments record three channels continuously at 20 sps and require disk changes about every two months.

Data dumping and archiving are done at the Geophysics Division of the Meteorological Office of Iceland, in Reykjavik. The Geophysics Division is excellently equipped with a network of Sun workstations and Internet facilities and staffed with a large group of experts in both seismology and computing. The computer support is better than at most American university departments. The network and data dumping/archiving work is conducted year-round by the two postgraduate research assistants.

PDB was installed in two Sun workstations supplied by PASSCAL, prior to their shipment to Iceland, and during the network installation the PASSCAL staff logged on to them frequently via the Internet to debug and maintain the software. Despite these favorable circumstances, we were unable to implement PDB in the sense that SEED volumes could be routinely generated during the three-month network installation period. Part of the reason was the immature state of the system and documentation, but the main reasons were simply the complexity of the task. As is usual in field projects, we were swamped by the overwhelming amount of work involved in installing stations, checking and servicing them, getting the 20-Gigabyte/two-workstation computer system running in Reykjavik, training the research assistants in UNIX and simple data-dumping/archiving work, dealing with customs, project administration, accounting, liaising with colleagues, etc. We had only three months in which to set up this $1 million experiment, and to have spent more time on PDB would have jeopardized the data collection. In order to satisfy the requirements of PASSCAL for SEED archives, we arranged a second trip to Reykjavik for a week in January 1997, along with a PASSCAL software engineer familiar with PDB. However, despite the even-better conditions it took a whole week to get PDB working. Our research assistant was trained to operate PDB during the second week and has since then been able to archive data successfully if consultations and Internet contact with PASSCAL staff are available on a daily basis. Some of the PDB programs are, however, very time-consuming to run. For example, it takes 11 hours to archive to tape four field-days of data. Backlogs of archiving thus build up when the research assistants go to the field to swap disks and when key PASSCAL personnel (whose efforts and dedication to duty we cannot praise too highly) are unavailable for a few days.

Scientific field experiments seldom enjoy the logistical and infrastructure advantages that the Iceland HOTSPOT project does and are usually much shorter-term. Often they must operate in remote areas without adequate transportation and telephone (to say nothing of Internet) services. Personnel often rotate during field experiments, and vary greatly in their geophysical and computer expertise. Simply getting to stations for periodic servicing nearly always is a taxing undertaking. Networks must be kept operating in the face of hardware failures, software failures, bad weather, political and personnel problems, shortages of money, vandalism and safety problems. A single failure (for example of a field computer) may cause the entire experiment to grind to a halt.

Field hardware and software need to be as simple and trouble-free as possible. Any unnecessary burden placed on field workers will inevitably reduce the amount and quality of data that can be collected.

Computer database management systems are enormously complicated collections of software. Keeping a database operating normally requires a highly trained, full-time "Database Administrator". A DBMS can provide useful services such as simultaneous multi-user data entry and access, guaranteed data consistency, audit trails, and user-specified variations in the data organization. Most of these services are irrelevant to field seismic experiments.

The data-management requirements of a field seismic experiment are:

PDB addresses the first requirement only, and in a way not optimized for field experiments. In addition to keeping track of useful data, PDB also includes a huge body of superfluous information such as the start and end times of every data block, information already stored on the data tapes and in the log files. Well-designed field service sheets and data scanning charts are more suited to keeping track of hardware and data. Careful use of such charts is much more likely to lead to benefits such as recognition of faulty units than is burying the information in a computerized database. Transferring data to tape takes vastly more computer and tape drive time under PDB than do simpler approaches.

The main justification for PDB apparently is to facilitate conversion of seismograms to SEED format for storage at the IRIS DMC, and the complexity of SEED is the main reason for the cost and complexity of PDB. The tail is wagging the dog. The success of PASSCAL-supported experiments is being jeopardized for the sake of storing the data in the particular form desired by some researchers.

Suggestions

It is clear from our Iceland experiment that, operating PDB is a major undertaking. Setting PDB up is a time-consuming task best done by an expert, learning how to use it requires a substantial training course, entering the data requires a great deal of extra work, and all these tasks require close liaison with PASSCAL staff via a reliable Internet connection. Archiving data for future scientific re-scrutiny is important, but it must be done in a way that does not jeopardize the acquisition of the data in the first place.

These comments reflect the views of the authors, and not necessarily those of other members of the Iceland HotSpot consortium.


Return to: IRIS Newsletter Information
Return to: Title Page and Table of Contents
Continue to: Next Article