GeoSciCloud: Exploring the potential for hosting a geoscience data center in the cloud
The IRIS Data Management Center (DMC) has operated a public repository of seismological data for 3 decades supporting thousands of researchers. Since its founding, the DMC has operated its own infrastructure to support the computational and storage resources needed to support its mission. In the GeoSciCloud project, a Building Block Project with support coming from EarthCube1, the DMC is deploying a subset of its archive and key software components into two cloud environments. This project will allow the DMC to evaluate the realities of operating in the cloud and explore the potential advantages and disadvantages.
The two cloud environments selected for this project are Amazon’s AWS and XSEDE’s Jetstream and Wrangler systems. The XSEDE resources are operated on behalf of NSF by Indiana University jointly with the Texas Advanced Computing Center. The DMC deployed a ~40 terabyte test data set and a subset of its web service-based data access architecture to both environments. The DMC is conducting an extensive evaluation of the capabilities of these deployments. To ensure these systems support and, ideally, improve upon real-world research use cases, the DMC is collaborating with scientists who have performed their own tests designed to meet their research needs.
A promising, expected gain from cloud-like environments over DMC-operated systems is the ability to scale-out in order to handle more simultaneous users, both with respect to storage I/O and processor intensive tasks. Another potential advantage is providing data within, or very near to, a powerful computing environment that researchers may also use. A key aspect to evaluate is the relative costs of the cloud environments against the DMC’s own infrastructure.
Based on testing thus far, preliminary results indicate that both cloud environments can deliver data at a much higher level of concurrent requests for both raw and processed data. Results also indicate transmission of data across the internet is quickly becomes the limiting factor as data volume increases. Comparison between the two cloud environments is illustrated in the following figures:
1 The GeoSciCloud project is supported by the National Science Foundation’s (NSF) EarthCube program, ICER-1639719.
EarthCube Building Blocks: Collaborative Proposal: Deploying Multi-Facility Cyberinfrastructrue in Commerical and Private Cloud-based Systems
by Chad Trabant (IRIS DMC) and Mike Stults (IRIS Data Management Center)