# What is CyVerse?

CyVerse started out in 2008 as the NSF-funded 'iPlant Collaborative' project to help the changing needs of the life sciences, which have increasingly heavy computational demands (think protein folding, genetic phenotyping, etc. etc.), but have historically not had high-performance computing facilities. The project was rechristened 'CyVerse' in 2015.

The two co-I's of the grant, Nirav Merchant and Eric Lyons, are life scientists here at UA and offer the course "Applied Concepts in Cyberinfrastructure". They had found that researchers were "actively avoiding advanced computing for their research projects because they viewed it as inflexible, complicated, frustrating, and time-consuming."

CyVerse offers:
1. Cloud computing
2. Data storage
3. Support

# Why should an astronomer care about it?

CyVerse cyberinfrastructure has the serendipitous effect of providing computational resources to basically anyone here at UA and >8,000 participating institutions. 

Possible reasons to care: 
1. If you have computational needs that are greater than what your office machines can provide, but less than what would usually pass for HPC jobs
2. If you want to calculate something now, and don't want to put your job in a queue
3. Very flexible for access remotely or from low-performance machines (basically all you need to do is 'ssh')
4. This infrastructure serves as a lead-in for addressing needs that span the community, like interoperability and reproducibility.

I (Eckhart) first heard about it at a Software Carpentry workshop. CyVerse made a brief appearance at a talk at the Internal Symposium in fall 2016. 

Using CyVerse computational resources, I performed MCMC and image-processing-related computations for three publications either published or in prep.). Using resources for which I did not require any extra human approval, it has shortened high-contrast imaging dataset reductions from >10 hours of wall time to ~30-40 minutes by distributing the jobs across several cores.

Ya-Lin Wu (grad student close to graduating) and Jared Males (new research professor here) used CyVerse to do image processing involving frame-by-frame parameterization and subtraction of host star PSFs, and fake planet injection for determination of sensitivity or brightnesses. In Males' case, this shortened reduction times from four months to two days. (This was actually a project of the "Applied Concepts in Cyberinfrastructure" class.)

See, for example,

### Cloud computing:

 - [This article](http://www.cyverse.org/single-post/2016/03/17/In-Search-of-a-Planet) on Males et al.'s work with CyVerse. There's also a video featuring the work done by "Jared Malice" et al. [here](https://www.youtube.com/watch?v=8tILpvMINYU&feature=youtu.be).
 - 'High-Contrast Imaging in the Cloud with klipReduce and Findr' (Haug-Baltzell+ 2016)
 - 'An ALMA and MagAO Study of the Substellar Companion GQ Lup B' (Wu+ 2017)

### Data curation:

[This arXiv post](https://arxiv.org/pdf/1802.03629.pdf) from two days ago is about [Astrolabe](http://astrolabe.arizona.edu), an outgrowth of CyVerse to curate astronomical data and prevent it from "going dark". (Ben Weiner is involved in this.)

# Storage space with CyVerse Data Store

[CyVerse Data Store](http://www.cyverse.org/data-store) gives you "100 GB of data storage, with the ability to request more". Large file transfers can be facilitated with multiple threads (iRODS iCommands).

# Data curation with Astrolabe

Follow the instructions [here](http://astrolabe.arizona.edu/index.php/contribute/). (One option is to upload data to Astrolabe from Data Store.)

# Virtual machines with CyVerse Atmosphere

[CyVerse Atmosphere](http://www.cyverse.org/atmosphere) is CyVerse's cloud-computing arm. With it, you can slap together a remote machine with a wide choice of operating system and hardware. 

Computational work is measured in 'AU' (allocation units?) which you use up and request for more. In my experience, when I make a new request, it is granted somewhere between a couple hours later up to the next weekday morning.

# Put together an instance

1. First, obviously, open a CyVerse account.
2. Log in to CyVerse Atmosphere, and go to Projects -> Create New Project. A 'Project' can encompass multiple instances and volumes.
3. Once you're in the new Project, go to New -> Instance. An 'Instance' is a fusion of the hardware, operating system, and software of your virtual machine.
4. Choose the operating system, number of cores, hard drive size, and RAM that you want.
5. Deploy it! (This takes some minutes to build and network.)
6. Use the IP address to ssh into the instance, or use the 'Web Shell' option. (The IP address changes each time you suspend and resume an instance.)

For lots of details, do not fail to consult the CyVerse Wiki.

# Configure

Once you ssh in for the first time, run the command

```
ezj
```

The CyVerse wizards added this to install a lot of the popular packages you might need. (I recommend also installing Anaconda as an all-in-one installer and an enironment manager. The latter will enable you to switch on-the-fly between versions of Python. For installation instructions, go [here](https://conda.io/docs/user-guide/install/index.html), and see info on managing environments [here](https://conda.io/docs/user-guide/tasks/manage-environments.html#activating-an-environment).)

# Day-to-day use

There are a list of 'Action' buttons on the right-hand side of the instance page. 

'Report' -> Report a problem

'Image' -> Take a snapshot of the instance with the things you've installed on it, for storing and re-use later.

'Suspend' -> Pause an instance if you're not using it (If you forget to do this, your AU units will slowly be eaten away!)

'Shelve' -> Longer-term storage, without hogging resources (I am told it also avoids the risk of the instance being deleted due to disuse)

'Stop' -> Everything powered off, but uses up AU units 

'Reboot' -> sometimes useful if it gets into a weird state

. . . and to request more resources, go to the project page and click the button with the circled up arrow.

# Add an instance or a volume

On the project page, click on 'New' and select what you want. If you have created a new volume, you need to 'attach' it (button in the upper right of the project page) to the instance, using the same provider location.

# Let's compare run speeds of a for-loop: local and remote dewarping of 8.4 Mb FITS frames

In [None]:
# import and define stuff

import numpy as np
from astrom_lmircam_soln import *
from astrom_lmircam_soln import polywarp
from astrom_lmircam_soln import dewarp
from astropy.io import fits
import matplotlib.pyplot as plt
import os
import time
from multiprocessing import Pool

#####################################################################
# SET THE DEWARP COEFFICIENTS

# the below coefficients are relevant for LMIRCam 2048x2048 readouts after modifications in summer 2016
# the below were taken in 2017B (DX ONLY)
Kx = [[ -1.34669677e+01, 2.25398365e-02, -7.39846082e-06, -8.00559920e-11],
 [ 1.03267422e+00, -1.10283816e-05, 5.30280579e-09, -1.18715846e-12],
 [ -2.60199694e-05, -3.04570646e-09, 1.12558669e-12, 1.40993647e-15],
 [ 8.14712290e-09, 9.36542070e-13, -4.20847687e-16, -3.46570596e-19]]
Ky = [[ 1.43440109e+01, 9.90752231e-01, -3.52171557e-06, 7.17391873e-09],
 [ -2.43926351e-02, -1.76691374e-05, 5.69247088e-09, -2.86064608e-12],
 [ 1.06635297e-05, 8.63408955e-09, -2.66504801e-12, 1.47775242e-15],
 [ -1.10183664e-10, -1.67574602e-13, 2.66154718e-16, -1.13635710e-19]]


#####################################################################
# set file paths

dirTreeStem = ('')

retrievalPiece = ('retrieve/')
depositPiece = ('deposit/')
fileNameStem = ('lm_171002_')

#####################################################################

# DEWARP 

# map the coordinates that define the entire image plane (2048x2048)
dewarp_coords = dewarp.make_dewarp_coordinates((2048,2048), np.array(Kx).T, np.array(Ky).T) # transposed due to a coefficient definition change btwn Python and IDL

def dewarp_frame_multiproc(frameNum, extra_dim=True):

 start_time = time.time()

 print('Dewarping frame '+str(frameNum)+'...')

 # grab the pre-dewarp image and header
 image, header = fits.getdata(dirTreeStem+
 retrievalPiece+
 fileNameStem+
 str("{:0>5d}".format(frameNum))+
 '.fits',
 0,
 header=True)

 # dewarp the image
 dewarped = dewarp.dewarp_with_precomputed_coords(image, dewarp_coords, order=3)

 # write out
 dewarped = np.squeeze(dewarped) # remove dimensions of size 1
 if extra_dim: # the LEECH pipeline still requires a singleton dimension
 dewarped = dewarped[None,:,:]

 hdu = fits.PrimaryHDU(dewarped, header=header)
 hdulist = fits.HDUList([hdu])
 hdulist.writeto(dirTreeStem+depositPiece+fileNameStem+str("{:0>5d}".format(frameNum))+'.fits',
 overwrite=True)

 elapsed_time = time.time() - start_time
 print(elapsed_time)

In [None]:
## Run on 1 local core:

# I have to run the function inside a for-loop, because with foresight I designed the function to be mapped to N CPUs

for frame in range(1,100):
 dewarp_frame_multiproc(frame)

In [None]:
## Run on all (2) local cores:

pool = Pool(2) # initialize Pool object for 2 streams of worker processes
channel_eq = pool.map(dewarp_frame_multiproc,range(1,100)) # map the processes and run

In [None]:
## Run on 16 remote cores of CyVerse instance:

## pool = Pool(16)
## channel_eq = pool.map(dewarp_frame_multiproc,range(1,100))

# In-person support

PhTea: Tues. 8-10 a.m. at the Nucleus Café in the Keating building

[Hacky Hour](https://www.meetup.com/ResBazAZ/events/247026059/): Thurs. 4-7 PM, currently at Gentle Ben's

# Scaling up: what other resources are there?

Open Science Grid with Pegasus

Google Cloud Platform

Exsede

Jetstream

Docker and DockerHub --> stay tuned for C.K.'s spiel

Singularity and SingularityHub (note OSG and Exsede use Singularity)

Learn about cloud computing with containers at Container Camp