Code Coffee is a bring-your-own-coffee discussion group on computing techniques and tools at Steward Observatory at the University of Arizona.

Announcements are distributed via the astro-code-coffee mailing list.

Next meeting

Topic: TBD

Speaker: TBD

Time: TBD

Location: TBD


  • Introductory C programming for astrophysicists

    Dr. Psaltis gave a one-hour crash course on C programming based on material from his class.

    Slides (PDF)

    Code listings

    ex1.c

    Compile with gcc ./ex1.c -o ex1, run with ./ex1.

    #include <stdio.h>
    
    int main(void) {
        float x;
        for (x = 0; x <= 1; x += 0.1) {
            printf("x = %10.8f f(x) %10.8f\n", x, x * x);
        }
        return 0;
    }
    

    ex2.c

    Compile with gcc ./ex1.c -o ex2, run with ./ex2.

    (Note: Linux needs you to compile with gcc ./ex1.c -o ex2 -lm. See this StackOverflow answer for historical context.)

    #include <stdio.h>
    #include <math.h>
    #include <time.h>
    #define Nrep 1000000
    int main(void) {
        double x=1.3, a;
        double time, Mflops;
        int i;
        clock_t ticks1, ticks2;
        ticks1 = clock();
        for (i=1;i<=Nrep;i++) {
            a=x+x;
        }
        ticks2=clock();
        time=(1.0*(ticks2-ticks1))/CLOCKS_PER_SEC/Nrep;
        Mflops=1.e-6/time;
        printf("it took %e seconds\n",time);
        printf("this corresponds to %f MFLOPS\n",Mflops);
        return 0;
    }
    

    Substitute the loop body (a = x + x) with lines from the following block, and see how the number of MFLOPS changes.

    a=i*x;
    a=i/x;
    a=i/x/x;
    a=i/(x*x);
    a=sin(x)*sin(x)+2.*cos(x)*cos(x);
    a=1.+cos(x)*cos(x);
    a=log(x);
    a=pow(x,5.);
    a=x*x*x*x*x;
    a=i/sqrt(pow(sin(x),2.000001)+2.*pow(cos(x),2.000001));
    a=i*pow(1.+cos(x)*cos(x),-0.5);
    
  • Structure and Development of Computer Programs

    Slides (PDF)

    Presentation materials: https://github.com/Sbozzolo/structure_development_tucson

    Book recommendations:

  • Interactive data visualization with Bokeh

    Slides

    Examples

  • Using containers to move computations to HPC and the Cloud

    Containers allow you to bundle up a program or script and all the packages and libraries it uses in a single archive that runs on any computer with a container runtime like Docker or Singularity (including the University HPC systems).

    To follow the slides, you will need to create a DockerHub account and install Docker Desktop for your platform:

    The full text of the example Dockerfile we built is below:

    FROM centos:6.10
    RUN yum install epel-release -y
    RUN yum update -y
    RUN rpm --import http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro
    RUN rpm -Uvh http://li.nux.ro/download/nux/dextop/el6/x86_64/nux-dextop-release-0-2.el6.nux.noarch.rpm
    RUN yum install ffmpeg ffmpeg-devel -y
    
  • git and GitHub for Research

    Slides.

    Additional resources

  • How to accelerate your code in under 10 lines

    Rachel Smullen gave a talk on using OpenACC, a system of hints for your C, C++, and Fortran code that enables a compiler to move computations to the GPU or other accelerators. She attended the International High Performance Computing Summer School last summer, which covered OpenACC, and was kind enough to share what she learned with all of us.

    Her slides

    Example code to start from (exercise and solution in the slides)

    There are also some slides from IHPCSS by John Urbanic that cover OpenACC in more detail:

  • Intro to CyVerse

    What is CyVerse?

    CyVerse started out in 2008 as the NSF-funded ‘iPlant Collaborative’ project to help the changing needs of the life sciences, which have increasingly heavy computational demands (think protein folding, genetic phenotyping, etc. etc.), but have historically not had high-performance computing facilities. The project was rechristened ‘CyVerse’ in 2015.

    Why should an astronomer care about it?

    CyVerse cyberinfrastructure has the serendipitous effect of providing computational resources to basically anyone here at UA and >8,000 participating institutions.

    Possible reasons to care:

    • If you have computational needs that are greater than what your office machines can provide, but less than what would usually pass for HPC jobs
    • If you want to calculate something now, and don’t want to put your job in a queue
    • Very flexible for access remotely or from low-performance machines (basically all you need to do is ‘ssh’)
    • This infrastructure serves as a lead-in for addressing needs that span the community, like interoperability and reproducibility.

    This talk was presented in a Jupyter Notebook which can be viewed on GitHub or downloaded directly.

  • How-to Docker (with astroML)

    Slides (PDF)

    This is a minimal Docker tutorial using astroML as an example. Docker is needed in order to try it out. The Docker Community Edition (CE) is free and is avliable at the Docker Website. It supports all the major platforms from Linux (e.g., Ubuntu), to Mac OS X, and even Windows.

    The supporting files for this tutorial are available in the repository for this site under downloads/2017-18/chan-howto-docker-astroML.

    Dockerfile is the main part of this tutorial. It tells Docker how to create a Docker image, which you can instanceize it into Docker containers on any machine with Docker—from your laptop to a powerful many-core virtual machine (VM) on CyVerse, to thousands of VMs that you launch with container orchestration platforms on Google Cloud Platform.

    Makefile contains the commands that we will run during the tutorial.

    plot_spectrum_sum_of_norms.py is an example from astroML. It is modified to run better in a container environment.

  • Real Programmers Debug with Fire Extinguishers

    Talk slides are available here. A hands-on session with real hardware is offered during the normal Code Coffee time slot (maybe 11/29 or 12/6); email Craig (ckulesa@email…) if you’re interested.

    Resources

    The Antarctic observatory used as an example in the talk is the High Elevation Antarctic Terahertz (HEAT) telescope at Ridge A.

    Go build something and control it with software. Sparkfun and Adafruit have a lot of good resources and some nifty development boards to get you started.

    Talking to hardware: SPI and I2C

    Wikipedia links for I2C and SPI.

    A very low-level example of performing SPI communication via a microcontroller in C is here.

    For the python-centric, look at spidev and I2C, also this.

    Talking to hardware: Serial (RS232, RS485, RS422)

    If you get a USB-to-serial converter, ones with a Prolific PL2303 chipset will work generically under just about any operating system without fuss. They can be purchased with flying leads for about $5 and with a DB9 connector for $10-20.

    The bible: Serial Programming Guide for POSIX Operating Systems.

    The pySerial API is excellent and will get you to a working device quickly.

    Talking to hardware: CAN bus

    The SocketCAN or can-utils package will let you display, record, generate, or replay CAN traffic.

    The python-can library is excellent.

    Wrappers for astronomical hardware

    Abstract your system onto the network using network sockets. Examples for basic client and server operations can be found in most languages like C and python. This lets you control your instrument using scripts or a GUI without impacting the hardware or low-level control software itself.

    INDI, the Instrument Neutral Distributed Interface is a nifty way to wrap up multiple elements of an astronomical system into a clean, self-describing system.

  • Python + joblib: Make your computer work harder, and save yourself time

    The joblib package (pip install joblib) provides helpers for easy parallelization and caching (memoization) of function outputs.

    Rachel presented examples of joblib’s Parallel helper. Her notebook is at

    https://github.com/rsmullen/CodeCoffee/blob/master/CodeCoffee_joblib.ipynb

    Joseph presented the principles behind and use of the joblib Memory helper for caching. His notebook is at

    https://github.com/ua-astro-grads/ua-astro-grads.github.io/blob/master/downloads/2017-18/joblib_memoization.ipynb

  • Code Principles and Style

    Things you should do but probably don’t(and probably won’t).

    Learn how to write better code that will be less fragile, easier to read and understand, and easier to use.

    Download the slides

  • Intro to C++11

    Slides (PDF).

  • UA High Performance Computing Resources

    This presentation covers the high-performance computing resources available at the University of Arizona. The presentation materials are on Rachel Smullen’s GitHub at https://github.com/rsmullen/UAHPC

    For posterity, they’re mirrored here below:

    Slides (PDF)

    UA HPC Commands

    The online documentation can be found here and is a good place to start if you have questions.

    Logging in

    To log in to the HPC system, from a campus network or the campus VPN, type

    ssh -Y username@hpc.arizona.edu
    

    You should come to the login node, called keymaster. You’ll see options to log in to either El Gato or Ocelote. Typing elgato or ocelote will not allow windows. You need to use ssh -Y elgato.

    To see what storage disks you have access to, use the command uquota.

    Loading software

    Your profile on the login nodes for the supercomputers don’t come with any pre-loaded software. To see available packages, type module avail. Then, to load a specific package, type module load modulename.

    For instance, module load python/3.4.1. To see what you have loaded, type module list. (If you don’t want to do this every time, you can add these commands to your .bashrc file.)

    Interacting with the scheduler

    Ocelote uses a scheduler called PBS, while El Gato uses the LSF scheduler. The commands are similar, but different enough to be a pain.

    El Gato

    • To see a list of available queues, type bqueues.
    • To see your running jobs, type bjobs.
    • To see everyone’s jobs, use bjobs -u all.

    Ocelote

    • To see a list of available queues, qstat -q.
    • To see all of your running jobs, type qstat -u username.
    • To see everyone’s jobs, use qstat.

    Running Jobs

    Embarassingly Parallel Jobs

    These are jobs where you want to execute the same command several times.

    El Gato

    Here is an example of an El Gato lsf script for an embarassingly parallel job. Save this in a file named something like lsf.sh.

    #!/bin/bash
    #BSUB -n 1                         ## number of processors given to each process
    #BSUB -e err_somename_%I           ## error files; make somename unique to other runs
    #BSUB -o out_somename_%I           ## output notification files
    #BSUB -q "your queue"              ## can be windfall, standard, or medium, depending on your advisor's allowed queues.
    #BSUB -u username
    #BSUB -J somename[start-finish]    ## Give the job a name (somename) and then fill in the processes you want, eg [1-100] or [1,2,3]
    #BSUB -R "span[ptile=1]"
    ####BSUB -w "done(JobID|JobName)"  ## Ask us about this fanciness
    
    #.${LSB_JOBINDEX} gives the run index 1,2,3...
    
    # use regular linux commands to copy/link executables, input files, etc., run python, or whatever else you want to do.  It will run in the subdirectory some_directory/some_runname${LSB_JOBINDEX}/. 
    
    mkdir some_directory
    mkdir some_directory/some_runname${LSB_JOBINDEX}
    cd some_directory/some_runname${LSB_JOBINDEX}/
    echo "I'm Job number ${LSB_JOBINDEX}"
    

    To execute this script, use bsub < lsf.sh. You can then check your job’s status with bjobs.

    Ocelote

    Here’s the same for Ocelote. The PBS scheduler is different in that you submit a job array. Save this script as something like pbs.sh.

    ## choose windfall or standard
    #PBS -q queuename
    ## select nodes:cpus per node:memory per node
    #PBS -l select=1:ncpus=1:mem=6gb
    ## the name of your job
    #PBS -N jobname
    ## the name of your group, typically your advisor's username
    #PBS -W group_list=yourgroup
    ## how the scheduler fills in your nodes
    #PBS -l place=pack:shared
    ## the length of time for your job
    #PBS -l walltime=1:00:00
    ## the indexes of your job array
    #PBS -J 1-5
    ## the location for your error files; this must exist first
    #PBS -e errorfiles/
    ## the location for your output files; this must exist first
    #PBS -o outfiles/
    
    # Now you can use your normal linux commands
    
    # Run the program for individual core ${PBS_ARRAY_INDEX}
    echo  "I'm Job number ${PBS_ARRAY_INDEX}"
    

    You can submit your job with qsub pbs.sh and then you can check your job with qstat -u yourname -t.

    Parallel Jobs

    We can also run parallel jobs on a supercomputer. (After all, that’s what they were designed for!)

    El Gato

    Here’s an example MPI script. Save it in lsf.sh. You can get the code in Rixin’s directory at /home/u5/rixin/mpi_hello_world.

    ###========================================
    #!/bin/bash
    # set the job name
    #BSUB -J mpi_test
    # set the number of cores in total
    #BSUB -n 32
    # request 16 cores per node
    #BSUB -R "span[ptile=16]"
    # request standard output (stdout) to file lsf.out
    #BSUB -o lsf.out
    # request error output (stderr) to file lsf.err
    #BSUB -e lsf.err
    # set the queue for this job as windfall
    #BSUB -q "medium"
    #---------------------------------------------------------------------
    
    ### load modules needed
    module load openmpi
    
    ### pre-execution work
    cd ~/mpi_hello_world
    make # compile the code, in this example case
    
    ### set directory for job execution
    cd ./elgato_sample_run
    ### run your program
    mpirun -np 32 ../mpi_hello_world > elgato_sample_output.txt
    ### end of script
    

    Use the same commands to submit and check the status as before.

    Ocelote

    And the same for Ocelote

    #!/bin/bash
    ##set the job name
    #PBS -N mpi_test
    ##set the PI group for this job
    #PBS -W group_list=kkratter
    ##set the queue for this job as windfall
    #PBS -q windfall
    ##request email when job begins and ends
    #PBS -m bea
    ##set the number of nodes, cores, and memory that will be used
    #PBS -l select=2:ncpus=28:mem=1gb
    ##specify "wallclock time" required for this job, hhh:mm:ss
    #PBS -l walltime=00:01:00
    ##specify cpu time = walltime * num_cpus
    #PBS -l cput=1:00:00
    
    ###load modules needed
    module load openmpi/gcc/2
    
    ###pre-execution work
    cd ~/mpi_hello_world
    make # compile the code, in this example case
    
    ###set directory for job execution
    cd ./ocelote_sample_run
    ###run your executable program with begin and end date and time output
    date
    /usr/bin/time mpirun -np 56 ../mpi_hello_world > ocelote_sample_output.txt
    date
    

    Killing jobs

    If you realize you made a mistake, or you want to kill a job that has been running for too long, use bkill jobid on El Gato or qdel jobid[].head1 on Ocelote.

    Interactive Nodes

    Do not, I repeat, DO NOT run programs on the login node. You’re using up resources for people that just want to check their job status! Instead, you can request an interactive node that lets you run programs from a compute node where you can use as much of the resources as you want.

    To get an interactive node, you submit a job to the scheduler requesting interactive resources. On El Gato, use bsub -XF -Is bash and on Ocelote, use qsub -I -N jobname -W group_list=groupname -q yourqueue -l select=1:ncpus=28:mem=168gb -l cput=1:0:0 -l walltime=1:0:0.

  • Github hands on

    This is a hands on tutorial on github that will cover the following topics:

    • Creating and setting up a github repository.
    • How to do commits to your repository.
    • How to create branches.
    • How to do forks and pull requests.

    Here are the slides!

    Feel free to email as at:

    ektapatel [at] email [dot] arizona [dot] edu

    jngaravitoc [at] email [dot] arizona [dot] edu

  • Object Oriented Programming in Python

    You can find my talk as a Jupyter Notebook located here:

    https://github.com/swyatt7/CC_ObjectOriented

    It overviews the 4 pillars of Object Oriented Programming (OOP) then provides a basic tutorial on how to implement classes in Python. There are also some astronomical examples followed by some cool OOP aspects in Python.

    If you ever have any questions, feel free to email me at swyatt@email.arizona.edu

  • Achieving maximum website

    Slides (PDF): Download

    Resources

    GitHub Pages

    Free hosting for small websites, decoupled from current institutional affiliation (but dependent on the continued generosity of a private company). GitHub Pages can automatically run Jekyll for you (see the section on static site generators).

    Domainr

    Check for availability of a domain name quickly.

    Static site generators

    Static site generators run once after you update your site and regenerate any modified HTML pages. Maintenance and performance wise, static HTML pages are way nicer than dynamically generated sites (think, in order of decade: SSI, PHP, Perl, Python, Ruby, etc.).

    Jekyll is written in Ruby and has pretty good documentation.

    Pelican is written in Python and has a comparable feature set, but the documentation is a bit confusing. (I like the template syntax it uses a little more.)

    Experimenting with HTML/CSS/JS

    Two useful sites for experimenting with HTML markup, CSS, or JavaScript code (e.g. to produce a minimal example of some issue you have) are jsfiddle.net (which I showed) and CodePen.io. They offer a multi-panel editor where you can see the resulting page in the same browser window.

subscribe via RSS