Cloud Genomics

Introduction 2

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • Key question

Objectives
  • First objective.

What are some of reasons to access a remote computer system?

Main Computational Resources

Computer
Node

ssh account@datacarpentry.host
df -h

wget http://www.hpa-bioinformatics.org.uk/lgp/resource/Sample280.fastq.gz
ls -lah Sample280.fastq.gz

gunzip Sample280.fastq.gz
ls -lah Sample280.fastq

The output on iPlant Atmosphere:

--2015-03-25 09:40:33--  http://www.hpa-bioinformatics.org.uk/lgp/resource/Sample280.fastq.gz
Resolving www.hpa-bioinformatics.org.uk... 194.74.226.185
Connecting to www.hpa-bioinformatics.org.uk|194.74.226.185|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 215949113 (206M) [application/x-gzip]
Saving to: \u201cSample280.fastq.gz\u201d

100%[==========================================================================================>] 215,949,113  213K/s   in 16m 48s 

2015-03-25 09:57:21 (209 KB/s) - \u201cSample280.fastq.gz\u201d saved [215949113/215949113]

-rw-r--r-- 1 dnasubway01 iplant-everyone 206M Jun 29  2011 Sample280.fastq.gz

-rw-r--r-- 1 dnasubway01 iplant-everyone 627M Jun 29  2011 Sample280.fastq

The output on Amazon EC2:

--2015-03-25 15:30:23--  http://www.hpa-bioinformatics.org.uk/lgp/resource/Sample280.fastq.gz
Resolving www.hpa-bioinformatics.org.uk (www.hpa-bioinformatics.org.uk)... 194.74.226.185
Connecting to www.hpa-bioinformatics.org.uk (www.hpa-bioinformatics.org.uk)|194.74.226.185|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 215949113 (206M) [application/x-gzip]
Saving to: \u2018Sample280.fastq.gz\u2019

Sample280.fastq.gz                100%[===============================================================>] 205.94M   376KB/s   in 9m 23s 

2015-03-25 15:39:46 (375 KB/s) - \u2018Sample280.fastq.gz\u2019 saved [215949113/215949113]

-rw-rw-r-- 1 ec2-user ec2-user 206M Jun 29  2011 Sample280.fastq.gz

-rw-rw-r-- 1 ec2-user ec2-user 627M Jun 29  2011 Sample280.fastq

The output on Azure:

--2015-03-25 14:37:18--  http://www.hpa-bioinformatics.org.uk/lgp/resource/Sample280.fastq.gz
Resolving www.hpa-bioinformatics.org.uk... 194.74.226.185
Connecting to www.hpa-bioinformatics.org.uk|194.74.226.185|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 215949113 (206M) [application/x-gzip]
Saving to: \u201cSample280.fastq.gz\u201d

100%[===========================================================================================================================================>] 215,949,113  216K/s   in 16m 26s 

2015-03-25 14:53:44 (214 KB/s) - \u201cSample280.fastq.gz\u201d saved [215949113/215949113]

-rw-r--r--. 1 root root 206M Jun 29  2011 Sample280.fastq.gz

-rw-r--r--. 1 root root 627M Mar 25 14:57 Sample280.fastq
cat /proc/cpuinfo

processor  : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Westmere E56xx/L56xx/X56xx (Nehalem-C)
stepping    : 1
cpu MHz     : 2400.084
cache size  : 4096 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 popcnt aes hypervisor lahf_lm
bogomips    : 4800.16
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 44
model name  : Westmere E56xx/L56xx/X56xx (Nehalem-C)
stepping    : 1
cpu MHz     : 2400.084
cache size  : 4096 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 popcnt aes hypervisor lahf_lm
bogomips    : 4800.16
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

On an Amazon EC2 instance, the following output shows availability of 2 Intel Xeon processors:

cat /proc/cpuinfo 
processor  : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping    : 4
microcode   : 0x415
cpu MHz     : 2494.060
cache size  : 25600 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips    : 4988.12
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping    : 4
microcode   : 0x415
cpu MHz     : 2494.060
cache size  : 25600 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips    : 4988.12
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

On an Microsoft Azure instance, the following output shows availability of 2 Intel Xeon processors:

processor  : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 63
model name  : Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz
stepping    : 2
cpu MHz     : 1995.365
cache size  : 40960 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm xsaveopt fsgsbase bmi1 avx2 smep bmi2 erms
bogomips    : 3990.73
clflush size    : 64
cache_alignment : 64
address sizes   : 42 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 63
model name  : Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz
stepping    : 2
cpu MHz     : 1995.365
cache size  : 40960 KB
physical id : 0
siblings    : 4
core id     : 1
cpu cores   : 4
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm xsaveopt fsgsbase bmi1 avx2 smep bmi2 erms
bogomips    : 3990.73
clflush size    : 64
cache_alignment : 64
address sizes   : 42 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 63
model name  : Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz
stepping    : 2
cpu MHz     : 1995.365
cache size  : 40960 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 4
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm xsaveopt fsgsbase bmi1 avx2 smep bmi2 erms
bogomips    : 3990.73
clflush size    : 64
cache_alignment : 64
address sizes   : 42 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 63
model name  : Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz
stepping    : 2
cpu MHz     : 1995.365
cache size  : 40960 KB
physical id : 0
siblings    : 4
core id     : 3
cpu cores   : 4
apicid      : 3
initial apicid  : 3
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm xsaveopt fsgsbase bmi1 avx2 smep bmi2 erms
bogomips    : 3990.73
clflush size    : 64
cache_alignment : 64
address sizes   : 42 bits physical, 48 bits virtual
power management:
top
time <your_application>
time awk "/\/1$/{print; getline; print}" Sample280.fastq > Sample280_1.fastq

The output on iPlant Atmosphere, Amazon EC2, and Microsoft Azure are shown respectively below, indicating that EC2 took half of the time of Azure.

real  0m20.893s
user    0m19.573s
sys 0m1.309s

real  0m14.856s
user    0m14.260s
sys 0m0.592s

real  0m36.294s
user    0m12.762s
sys 0m0.432s

Distributed System Definitions and stacks:

(Note that many definitions exist for these terms)

HPC vs. Cloud:

HPC Cloud
User account on the system root account on the system
Limited control of the system Full control of the system
Central shared file system Local file system
Jobs submitted into a queue Jobs executed on each resource
Account-based isolation OS-based isolation
Batch-oriented execution of applications support for batch or interactive applications
Request for resource and time allocation Pay-per-use
etc. etc.

HPC vs.
Cloud

Resources:

Key Points