Usage examples

Note

In the following examples, test_data can be replaced with the directory containing the FAST5 files from your own runs. If you are new to ONT sequencing, the test_data directory is shipped with poretools for experimentation.

poretools fastq

Extract sequences in FASTQ format from a set of FAST5 files.

poretools fastq test_data/*.fast5

Or, if there are too many files for your OS to do the wildcard expansion, just provide a directory. poreutils will automatically find all of the FAST5 files in the directory.

poretools fastq test_data/

Extract sequences in FASTQ format from a set of FAST5 files.

poretools fastq test_data/
poretools fastq --min-length 5000 test_data/
poretools fastq --max-length 5000 test_data/
poretools fastq --type all test_data/
poretools fastq --type fwd test_data/
poretools fastq --type rev test_data/
poretools fastq --type 2D test_data/
poretools fastq --type fwd,rev test_data/

A type of “best” will extract the 2D read, if it exists. If not, it will extract either the template or complement read, whichever is available and has a better average Phred score.

poretools fastq --type best test_data/

Only extract sequence with more complement events than template. These are the so-called “high quality 2D reads” and are the most accurate sequences from a given run.

poretools fastq --type 2D --high-quality test_data/

The data in fastq format are returned in standard output.

poretools fasta

Extract sequences in FASTA format from a set of FAST5 files.

poretools fasta test_data/
poretools fasta --min-length 5000 test_data/
poretools fasta --max-length 5000 test_data/
poretools fasta --type all test_data/
poretools fasta --type fwd test_data/
poretools fasta --type rev test_data/
poretools fasta --type 2D test_data/
poretools fasta --type fwd,rev test_data/
poretools fasta --type best test_data/

The data in fasta format are returned in standard output.

poretools combine

Create a tarball from a set of FAST5 (HDF5) files.

# plain tar (recommended for speed)
poretools combine -o foo.fast5.tar test_data/*.fast5

# gzip
poretools combine -o foo.fast5.tar.gz test_data/*.fast5

# bzip2
poretools combine -o foo.fast5.tar.bz2 test_data/*.fast5

poretools yield_plot

Create a collector’s curve reflecting the sequencing yield over time for a set of reads. There are two types of plots. The first is the yield of reads over time:

poretools yield_plot --plot-type reads test_data/

The result should look something like:

../_images/yield.reads.png

The second is the yield of base pairs over time:

poretools yield_plot --plot-type basepairs test_data/

The result should look something like:

../_images/yield.bp.png

Of course, you can save to PDF or PNG with –saveas:

poretools yield_plot \
          --plot-type basepairs \
          --saveas foo.pdf\
          test_data/

poretools yield_plot \
          --plot-type basepairs \
          --saveas foo.png\
          test_data/

If you don’t like the default aesthetics, try –theme-bw:

poretools yield_plot --theme-bw test_data/

poretools squiggle

Make a “squiggle” plot of the signal over time for a given read or set of reads

poretools squiggle test_data/foo.fast5

The result should look something like:

../_images/foo.fast5.png

If you don’t like the default aesthetics, try –theme-bw:

poretools squiggle --theme-bw test_data/

Other options:

# save as PNG
poretools squiggle --saveas png test_data/foo.fast5

# save as PDF
poretools squiggle --saveas pdf test_data/foo.fast5

# make a PNG for each FAST5 file in a directory
poretools squiggle --saveas png test_data/

poretools winner

Report the longest read among a set of FAST5 files.

poretools winner test_data/
poretools winner --type all test_data/
poretools winner --type fwd test_data/
poretools winner --type rev test_data/
poretools winner --type 2D test_data/
poretools winner --type fwd,rev test_data/
poretools winner --type best test_data/

poretools stats

Collect read size statistics from a set of FAST5 files.

poretools stats test_data/
total reads 2286.000000
total base pairs    8983574.000000
mean    3929.822397
median  4011.500000
min 13.000000
max 6864.000000

poretools hist

Plot a histogram of read sizes from a set of FAST5 files.

poretools hist test_data/
poretools hist --min-length 1000 --max-length 10000 test_data/

poretools hist --num-bins 20 --max-length 10000 test_data/

If you don’t like the default aesthetics, try –theme-bw:

poretools hist --theme-bw test_data/

The result should look something like:

../_images/hist.png

poretools nucdist

Look at the nucleotide composition of a set of FAST5 files.

poretools nucdist test_data/
A   78287   335291  0.233489714904
C   75270   335291  0.224491561062
T   92575   335291  0.276103444471
G   84754   335291  0.252777438106
N   4405    335291  0.0131378414571

poretools qualdist

Look at the quality score composition of a set of FAST5 files.

poretools qualdist test_data/
!   0   83403   335291  0.248748102395
"   1   46151   335291  0.137644613187
#   2   47463   335291  0.141557632027
$   3   34471   335291  0.102809201559
%   4   24879   335291  0.0742012162569
&   5   20454   335291  0.0610037251224
'   6   16783   335291  0.0500550268274
(   7   13699   335291  0.0408570465655
)   8   11356   335291  0.0338690868529
*   9   9077    335291  0.0270720061081
+   10  6492    335291  0.0193622852984
,   11  4891    335291  0.014587328619
-   12  3643    335291  0.0108651887465
.   13  2585    335291  0.00770972080968
/   14  1969    335291  0.0058725107444
0   15  1475    335291  0.00439916371152
1   16  1146    335291  0.00341792651756
2   17  902 335291  0.00269020045274
3   18  790 335291  0.00235616225905
4   19  619 335291  0.0018461575169
5   20  532 335291  0.00158668142002
6   21  440 335291  0.00131229290378
7   22  397 335291  0.00118404609727
8   23  379 335291  0.00113036138757
9   24  313 335291  0.000933517452004
:   25  327 335291  0.000975272226215
;   26  138 335291  0.000411582774366
<   27  121 335291  0.000360880548538
=   28  96  335291  0.000286318451733
>   29  76  335291  0.000226668774289
?   30  69  335291  0.000205791387183
@   31  61  335291  0.000181931516205
A   32  48  335291  0.000143159225866
B   33  23  335291  6.8597129061e-05
C   34  14  335291  4.17547742111e-05
D   35  6   335291  1.78949032333e-05
F   37  3   335291  8.94745161666e-06

poretools qualpos

Produce a box-whisker plot of qualoty score distribution over positions in reads.

poretools qualpos test_data/

The result should look something like:

../_images/qualpos.png

poretools tabular

Dump the length, name, seq, and qual of the sequence in one or a set of FAST5 files.

poretools tabular foo.fast5
length  name    sequence    quals
10    @channel_100_read_14_complement   GTCCCCAACAACAC    $%%'"$"%!)

poretools events

Extract the raw nanopore events from each FAST5 file.

poretools events test_data/ | head -5
file    strand  mean    start   stdv    length  model_state model_level move    p_model_state   mp_model_state  p_mp_model_state    p_A p_C p_G p_T raw_index
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5    template    58.3245290305   1559.89409031   1.34165996292   0.0146082337317 CGACTT  58.1304809188   0   0.0226559   CATCTT  0.0229866   0.284469    0.130683    0.137386    0.447461
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5    template    50.1420877511   1559.90869854   0.921372775302  0.0348605577689 GACTTT  49.3934875964   1   0.0849836   GACTTT  0.0849836   0.257314    0.350541    0.101351    0.290794
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5    template    47.5841029424   1559.9435591    0.771398562801  0.00763612217795    ACTTTG  48.2080162623   1   0.108899    TCTTTG  0.13079 0.000477931 0.00853333  0.306356    0.684632
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5    template    51.5879264562   1559.95119522   0.684238307171  0.0112881806109 CTTTGA  52.7784154546   1   0.110625    CTTTGG  0.121103    4.69995e-06 0.00382846  0.0169048   0.979262

Extract the pre-basecalled events from each FAST5 file.

poretools events --pre-basecalled test_data/ | head -5
file    strand  mean    start   stdv    length  model_state     model_level     move    p_model_state   mp_model_state  p_mp_model_state        p_A     p_C     p_G     p_T     raw_index
burn-in-run-2/ch100_file15_strand.fast5     pre_basecalled  51.4652695313   5352344 0.655003995591      35
burn-in-run-2/ch100_file15_strand.fast5     pre_basecalled  60.1776123047   5352379 1.05143911309       18
burn-in-run-2/ch100_file15_strand.fast5     pre_basecalled  48.9152374359   5352397 0.864834628834      67
burn-in-run-2/ch100_file15_strand.fast5     pre_basecalled  55.4002178596   5352464 1.75915620083       17

poretools times

poretools times test_data/ | head -5
channel filename    read_length exp_starttime   unix_timestamp  duration    unix_timestamp_end  iso_timestamp   day hour    minute
120 test_data/2016_3_4_3507_1_ch120_read240_strand.fast5    5826    1457127309  1457128868  47  1457128915  2016-03-04T15:01:08-0700    04  15  01
120 test_data/2016_3_4_3507_1_ch120_read353_strand.fast5    3399    1457127309  1457129863  28  1457129891  2016-03-04T15:17:43-0700    04  15  17
120 test_data/2016_3_4_3507_1_ch120_read415_strand.fast5    2640    1457127309  1457130808  24  1457130832  2016-03-04T15:33:28-0700    04  15  33
120 test_data/2016_3_4_3507_1_ch120_read418_strand.fast5    3487    1457127309  1457130851  31  1457130882  2016-03-04T15:34:11-0700    04  15  34

poretools occupancy

Plot the throughput performance of each pore on the flowcell during a given sequencing run.

poretools occupancy test_data/

The result should look something like:

../_images/occupancy.png

poretools index

Tabulate all file location info and metadata such as ASIC ID and temperature from a set of FAST5 files

poretools index test_data | head -5 | column -t
source_filename                                       template_fwd_length  complement_rev_length  2d_length  asic_id     asic_temp  heatsink_temp  channel  exp_start_time  exp_start_time_string_date  exp_start_time_string_time  start_time  start_time_string_date  start_time_string_time  duration  fast5_version
test_data/2016_3_4_3507_1_ch120_read240_strand.fast5  5826                 5011                   5079       3571011476  30.37      36.99          120      1457127309      2016-Mar-04                 (Fri)                       14:35:09    1457128868              2016-Mar-04             (Fri)     15:01:08       47  metrichor1.16
test_data/2016_3_4_3507_1_ch120_read353_strand.fast5  3399                 2962                   2940       3571011476  30.37      36.99          120      1457127309      2016-Mar-04                 (Fri)                       14:35:09    1457129863              2016-Mar-04             (Fri)     15:17:43       28  metrichor1.16
test_data/2016_3_4_3507_1_ch120_read415_strand.fast5  2640                 2244                   2428       3571011476  30.37      36.99          120      1457127309      2016-Mar-04                 (Fri)                       14:35:09    1457130808              2016-Mar-04             (Fri)     15:33:28       24  metrichor1.16
test_data/2016_3_4_3507_1_ch120_read418_strand.fast5  3487                 2950                   3384       3571011476  30.37      36.99          120      1457127309      2016-Mar-04                 (Fri)                       14:35:09    1457130851              2016-Mar-04             (Fri)     15:34:11       31  metrichor1.16

Extract the metadata from the fast5 file

poretools metadata  013731_11rx_v2_3135_1_ch20_file19_strand.fast5

asic_id asic_temp   heatsink_temp
31037   28.11   37.88

poretools metadata --read  013731_11rx_v2_3135_1_ch20_file19_strand.fast5
filename    scaling_used    abasic_peak_height  hairpin_polyt_level median_before   start_time  read_id read_number hairpin_peak_height abasic_found    abasic_event_index  duration    start_mux   hairpin_found   hairpin_event_index
013731_11rx_v2_3135_1_ch20_file19_strand.fast5    1   124.31769966    0.413218809334  226.393825112   4648221 3b4e45bf-6d42-45bc-9314-1d8a630971c2    19  125.783167256   1   2   195322  4   1   1478