In this practical we will learn how to use bowtie to align short sequences to a genome.
Bowtie is the first version of a series of programs designed to align short sequences to genomes. These include: bowtie, bowtie2, tophat, tophat2, hisat y hisat2.
More recent programs have focused on allowing more flexibility during mapping, such as allowing an increased number of mismatches, better handling of gaps in the alignment, and properly dealing with introns and paired-end reads. If you are planning on analysing regular RNA-Seq data (sequencing mRNA) I recommend you try out hisat2, and definitely do not use bowtie.
Nevertheless, bowtie still remains one of the most practical options to align very short sequences, such as small RNAs, since we usually do not want to allow a large number of mismatches. In fact, in some cases we will be interested in focusing on perfectly-mapping reads, and bowtie is perfectly suited for this.
Bowtie and other aligners use the Burrows-Wheeler transform to compress the genome sequence, allowing even the human genome to fit on about 3-4 GB of RAM. The Wikipedia has a basic description of this algorith: Burrows-Wheeler_transform.
Any aligner, including bowtie, will require quite a bit of compute time and resources when working with a large genome. For this reason, we will be using a smaller part of the genome to make this practical more efficient. The file I have prepared contains the first 10,000,000 nt of the first three chromosomes of Heligmosomoides bakeri. In addition, I have included sequences from NCBI’s UniVec Database, which is a common resource to identify potential contaminants resulting from vectors, linkers and primers commonly used during cloning experiments.
You will need to download the following two files:
These last files are essentially the same ones as you produced at the
end of the last practical. Remember that to extract all the files from a
tar file you need to execute:
tar -xvf sRNA_cutadapt_100k.tar.
Remember we need at least one file with short sequences and a reference sequence (genome, transcriptome, chromosomes, etc). Make sure you already have these two files in your working directory.
Bowtie has two programs we will be using: bowtie and
bowtie-build. We first need to use
bowtie-build to pre-process any reference sequence we want
to use for alignments.
You can find a detailed explanation of all parameters in the manual.
To see the parameters for one of these programs, we ask for the help:
## Usage: bowtie-build [options]* <reference_in> <ebwt_outfile_base>
## reference_in comma-separated list of files with ref sequences
## ebwt_outfile_base write Ebwt data to files with this dir/basename
## Options:
## -f reference files are Fasta (default)
## -c reference sequences given on cmd line (as <seq_in>)
## --large-index force generated index to be 'large', even if ref
## has fewer than 4 billion nucleotides
## -a/--noauto disable automatic -p/--bmax/--dcv memory-fitting
## -p/--packed use packed strings internally; slower, uses less mem
## --bmax <int> max bucket sz for blockwise suffix-array builder
## --bmaxdivn <int> max bucket sz as divisor of ref len (default: 4)
## --dcv <int> diff-cover period for blockwise (default: 1024)
## --nodc disable diff-cover (algorithm becomes quadratic)
## -r/--noref don't build .3/.4.ebwt (packed reference) portion
## -3/--justref just build .3/.4.ebwt (packed reference) portion
## -o/--offrate <int> SA is sampled every 2^offRate BWT chars (default: 5)
## -t/--ftabchars <int> # of chars consumed in initial lookup (default: 10)
## --threads <int> # of threads
## --ntoa convert Ns in reference to As
## --seed <int> seed for random number generator
## -q/--quiet verbose output (for debugging)
## -h/--help print detailed description of tool and its options
## --usage print this usage message
## --version print version information and quit
There are a few advanced options that you might need to tweak at some
point. For instance, if your genome has a large fraction of repeat
sequences, and you want bowtie to be faster, at the expense of more
memory, you can decrease the parameter --offrate to 3 or 2.
But in general, we can use the defaults.
To generate our reference index, ready for mapping:
Settings:
Output files: "hbak.*.ebwt"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 5 (one in 32)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
Hbakeri_small_UniVec.fa.gz
This might take a couple of minutes, depending on your computer.
Once the index is finished, we can inspect the information it
contains with the command bowtie-inspect:
## SA-Sample 1 in 32
## FTab-Chars 10
## Sequence-1 I 10000000
## Sequence-2 II 10000000
## Sequence-3 III 10000000
## Sequence-4 univec|X66730.1:1-2687-49 B.bronchiseptica plasmid pBBR1 genes for mobilization and replication 2736
## Sequence-5 univec|J01749.1:1-4361-49 Cloning vector pBR322 4410
## Sequence-6 univec|J02400.1:1-5243-49 Simian virus 40 5292
## Sequence-7 univec|J02482.1:1-5386-49 Coliphage phi-X174 5435
## Sequence-8 univec|V00604.2:1-6407-49 Phage M13 genome 6456
## Sequence-398 univec|AF310136.1:4127-5198 Plasposon NKBOR 1072
## Sequence-399 univec|L19900.1:1-667 Cloning vector cosmid svPHEP DNA sequence encoding beta-lactamase and HSV thymidine kinase genes 667
## Sequence-400 univec|L19900.1:803-1010 Cloning vector cosmid svPHEP DNA sequence encoding beta-lactamase and HSV thymidine kinase genes 208
## Sequence-401 univec|L19900.1:2493-3111 Cloning vector cosmid svPHEP DNA sequence encoding beta-lactamase and HSV thymidine kinase genes 619
## Sequence-402 univec|M99566.1:573-1230 sCos cloning vector SfiI containing bacteriophage promoters and flanking restriction sites in sCos vectors 658
## Sequence-403 univec|M99566.1:1654-2277 sCos cloning vector SfiI containing bacteriophage promoters and flanking restriction sites in sCos vectors 624
## Sequence-404 univec|D16587.1:1-258 Unidentified cloning vector pNTT104 DNA, vector for E. coli, polylinker sequence 258
## Sequence-405 univec|NGB00603.1:1-319 Cloning vector pET-28a cloning and expression region 319
## Sequence-406 univec|NGB00075.1:11-234 pFLC-I polylinker for RH clones 224
## Sequence-407 univec|NGB00153.1:1-234 pUCm-T multiple cloning site 234
This allows us to see that we have 407 sequences. The first three sequences are much longer (10,000,000 nt) and are named I, II, III. These are the first part of the H. bakeri chromosomes. The rest are the UniVec sequences I mentioned above.
The bowtie-inspect command can be useful to list the
contents of genomes with many sequences (usually scaffolds or contigs)
or even for measuring the length of chromosomes.
When we have our index ready, we can use the bowtie
command to map our reads. This command, similarly to other mapping
programs, has many different options so we can control how the alignment
will be performed, and which results we would like to save. You can
simply type bowtie to see a brief description of these
options. For a more complete description, you can have a look at the manual.
Here I show you some of the most important options:
No index, query, or output file specified!
Usage:
bowtie [options]* -x <ebwt> {-1 <m1> -2 <m2> | --12 <r> | --interleaved <i> | <s>} [<hit>]
<ebwt> Index filename prefix (minus trailing .X.ebwt).
<m1> Comma-separated list of files containing upstream mates (or the
sequences themselves, if -c is set) paired with mates in <m2>
<m2> Comma-separated list of files containing downstream mates (or the
sequences themselves if -c is set) paired with mates in <m1>
<r> Comma-separated list of files containing Crossbow-style reads. Can be
a mixture of paired and unpaired. Specify "-" for stdin.
<i> Files with interleaved paired-end FASTQ reads.
<s> Comma-separated list of files containing unpaired reads, or the
sequences themselves, if -c is set. Specify "-" for stdin.
<hit> File to write hits to (default: stdout)
Input:
-q query input files are FASTQ .fq/.fastq (default)
-f query input files are (multi-)FASTA .fa/.mfa
-5/--trim5 <int> trim <int> bases from 5' (left) end of reads
-3/--trim3 <int> trim <int> bases from 3' (right) end of reads
Alignment:
-v <int> report end-to-end hits w/ <=v mismatches; ignore qualities
or
-n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
-m <int> suppress all alignments if > <int> exist (def: no limit)
Reporting:
-k <int> report up to <int> good alignments per read (default: 1)
-a/--all report all alignments per read (much slower than low -k)
--best hits guaranteed best stratum; ties broken by quality
Output:
-t/--time print wall-clock time taken by search phases
--al <fname> write aligned reads/pairs to file(s) <fname>
--un <fname> write unaligned reads/pairs to file(s) <fname>
--suppress <cols> suppresses given columns (comma-delim'ed) in default output
SAM:
-S/--sam write hits in SAM format
Performance:
-p/--threads <int> number of alignment threads to launch (default: 1)
The next command will use the index (-x)
hbak to map all the sequences contained in the file
Pure_EV_r1_100k.trimmed.fa.gz and will save the alignment
results in Pure_EV_r1.aligned.bwt. The other parameters
specify that we only want full-length alignments (the whole read must
align, not just part of it) with at most 1 mismatch (-v 1),
that we only want to save one alignment for each sequence
(-k 1) and that the sequences we want to align are in fasta
format (-f) since the default is fastq.
## # reads processed: 99975
## # reads with at least one alignment: 90810 (90.83%)
## # reads that failed to align: 9165 (9.17%)
## Reported 90810 alignments
We can see that the great majority of sequences did align to our
index, which could be expected since our Pure_EV libraries
come from purified vesicles from H. bakeri and should ideally
contain no material from other species.
Let us have a look at the alignment output:
## D00261:444:CBEUGANXX:3:1105:9313:2248 1:N:0:AGTCAA - II 2384331 TTTACGACCACGACGCTTCCAC IIIIIIIIIIIIIIIIIIIIII 0 16:A>G
## D00261:444:CBEUGANXX:3:1208:12895:95475 1:N:0:AGTCAA - I 553644 ACCCGACAGAGTCACATATCTC IIIIIIIIIIIIIIIIIIIIII 0
## D00261:444:CBEUGANXX:3:1203:1482:96832 1:N:0:AGTCAA - I 464568 CCAGCTATTCCAATCCCAACCTTC IIIIIIIIIIIIIIIIIIIIIIII 0
## D00261:444:CBEUGANXX:3:2314:8667:62576 1:N:0:AGTCAA + I 9467610 GTACCGACTTGCAAAGAATCGCC IIIIIIIIIIIIIIIIIIIIIII 9
## D00261:444:CBEUGANXX:3:1203:14699:72177 1:N:0:AGTCAA + I 5014505 GGCGATCGTGGATTTGGCGG IIIIIIIIIIIIIIIIIIII 0
## D00261:444:CBEUGANXX:3:2315:17872:58338 1:N:0:AGTCAA + II 5699454 GTACTTTGAAGATCGCTCCTCCGA IIIIIIIIIIIIIIIIIIIIIIII 2
## D00261:444:CBEUGANXX:3:1105:14256:2373 1:N:0:AGTCAA + I 9467128 GAAGATCGTGCCGTATGAGACGG IIIIIIIIIIIIIIIIIIIIIII 14
## D00261:444:CBEUGANXX:3:2312:3892:98535 1:N:0:AGTCAA - II 5328543 CTACACTTCAGCCGCCGGG IIIIIIIIIIIIIIIIIII 0 2:T>G
## D00261:444:CBEUGANXX:3:1109:1994:86472 1:N:0:AGTCAA - I 554745 TCGTTCCAAGCAATACCACAAAC IIIIIIIIIIIIIIIIIIIIIII 0
## D00261:444:CBEUGANXX:3:1105:20944:2331 1:N:0:AGTCAA + II 4221840 GGTATCAGACGTTGTGGAGGG IIIIIIIIIIIIIIIIIIIII 3
The columns are:
Knowing a bit of Linux allows us to explore some more questions, taking advantage of this simple output format.
## 90810 Pure_EV_r1_100k.aligned.bwt
## 51414 I
## 19807 II
## 19587 III
## 2 univec|AF327712.1:16794-18581
So there are some reads that appear to be contaminants! In this case the number is probably too small to cause any problems in our analyses.
MODEK_ctrl libraries represent pure
mouse cells, MODEK_EV are mouse cells treated with H.
bakeri vesicles, and the Pure_EV libraries are from
purified vesicles.Due to the great variety of alignment or mapping tools for short
reads against genomes, there was a need to design a standard format that
all tools could follow. This has the advantage that downstream tools
(such as those designed to find SNPs, detect structural variants, or
quantify gene expression) can be designed to expect one single input
format. This is the reason why the SAM (Sequence Alignment/Map) format
was designed, and samtools was developed to manipulate this
format (www.htslib.org).
The SAM format is designed to be simple, lightweight and easy to process by a computer. It also allows flexibility to include a lot of information about the alignments, that other programs might require. The format is described in the following document: SAM1.pdf.
To get a better idea of how to work with this format, we will use
bowtie again to repeat the previous alignment, but now saving the output
in SAM format (-S):
## # reads processed: 99975
## # reads with at least one alignment: 90810 (90.83%)
## # reads that failed to align: 9165 (9.17%)
## Reported 90810 alignments
Let’s have a look at the SAM output:
## @HD VN:1.0 SO:unsorted
## @SQ SN:I LN:10000000
## @SQ SN:II LN:10000000
## @SQ SN:III LN:10000000
## @SQ SN:univec|X66730.1:1-2687-49 LN:2736
## @SQ SN:univec|J01749.1:1-4361-49 LN:4410
## @SQ SN:univec|J02400.1:1-5243-49 LN:5292
## @SQ SN:univec|J02482.1:1-5386-49 LN:5435
## @SQ SN:univec|V00604.2:1-6407-49 LN:6456
## @SQ SN:univec|J02459.1:1-48502-49 LN:48551
The lines that begin with a @ are the headers that
include meta-data about the alignment. In this case we can see
information about the version of the program, if the file is sorted, the
name and length of the reference sequences.
## D00261:444:CBEUGANXX:3:2303:12416:92715 0 I 7373145 255 26M * 0 0 TCCGTTGTGGTCTAGTGGTTAGGATT IIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:26 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:2214:4626:10034 16 I 553550 255 23M * 0 0 TCTTCTCCGATCGAAGGATGTGC IIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:23 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:2312:3644:36731 0 II 7491950 255 23M * 0 0 GTACCTTGGATCAGCAGTCGCGT IIIIIIIIIIIIIIIIIIIIIII XA:i:1 MD:Z:6G16 NM:i:1 XM:i:1
## D00261:444:CBEUGANXX:3:2102:10304:18281 0 III 1836230 255 23M * 0 0 GATACTCTGACAGTCGAAGCACC IIIIIIIIIIIIIIIIIIIIIII XA:i:1 MD:Z:3T19 NM:i:1 XM:i:1
## D00261:444:CBEUGANXX:3:1308:19487:56966 0 I 9228741 255 23M * 0 0 GTCACAGACGCGAAGATCGTGCC IIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:23 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:2109:11156:62889 0 I 2590445 255 23M * 0 0 GAAGATCGTGCCGTATGAGACGG IIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:23 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:2303:6343:92983 16 I 556397 255 30M * 0 0 CGTCCAACAGTCCGTGCTCGGTAAAGGAAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:30 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:1209:9868:35563 0 I 6512832 255 21M * 0 0 GCTCAAAGATCTTCATGCTGT IIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:21 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:2214:17713:20815 4 * 0 0 * * 0 0 GTTCACTAATCCGCGGAGACTCTT IIIIIIIIIIIIIIIIIIIIIIII XM:i:0
## D00261:444:CBEUGANXX:3:2210:17753:50480 0 I 5012294 255 26M * 0 0 GTTCTAATTCGAATGTAGGGTTCTTT IIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:26 NM:i:0 XM:i:1
Each remaining line (we’re just looking at the last 10 here) corresponds to an alignment between an input sequence and a reference. The first 11 columns are required (followed by optional columns) and contain the following information:
Let’s have a look at all the flags that were found in our
sam file:
## 0
## 16
## 4
SAM files are stored in plain text, and we can view their contents
directly. But, they take up a lot of space. The equivalent binary format
is called BAM, and has the advantage of being compressed and is better
suited for further processing. But, we need programs like
samtools to view the contents of a BAM file.
The program samtools has a series of internal commands,
each allowing a variety of processes on SAM (or BAM) files.
Typing samtools shows us the available commands, and
choosing one gives us the options available.
##
## Program: samtools (Tools for alignments in the SAM format)
## Version: 1.21 (using htslib 1.21)
##
## Usage: samtools <command> [options]
##
## Commands:
## -- Indexing
## dict create a sequence dictionary file
## faidx index/extract FASTA
## fqidx index/extract FASTQ
## index index alignment
##
## -- Editing
## calmd recalculate MD/NM tags and '=' bases
## fixmate fix mate information
## reheader replace BAM header
## targetcut cut fosmid regions (for fosmid pool only)
## addreplacerg adds or replaces RG tags
## markdup mark duplicates
## ampliconclip clip oligos from the end of reads
##
## -- File operations
## collate shuffle and group alignments by name
## cat concatenate BAMs
## consensus produce a consensus Pileup/FASTA/FASTQ
## merge merge sorted alignments
## mpileup multi-way pileup
## sort sort alignment file
## split splits a file by read group
## quickcheck quickly check if SAM/BAM/CRAM file appears intact
## fastq converts a BAM to a FASTQ
## fasta converts a BAM to a FASTA
## import Converts FASTA or FASTQ files to SAM/BAM/CRAM
## reference Generates a reference from aligned data
## reset Reverts aligner changes in reads
##
## -- Statistics
## bedcov read depth per BED region
## coverage alignment depth and percent coverage
## depth compute the depth
## flagstat simple stats
## idxstats BAM index stats
## cram-size list CRAM Content-ID and Data-Series sizes
## phase phase heterozygotes
## stats generate stats (former bamcheck)
## ampliconstats generate amplicon specific stats
##
## -- Viewing
## flags explain BAM flags
## head header viewer
## tview text alignment viewer
## view SAM<->BAM<->CRAM conversion
## depad convert padded BAM to unpadded BAM
## samples list the samples in a set of SAM/BAM/CRAM files
##
## -- Misc
## help [cmd] display this help message or help for [cmd]
## version detailed version information
To start out, let’s look at one of the most widely used options, viewing the contents of a file:
## D00261:444:CBEUGANXX:3:1306:11355:85767 4 * 0 0 * * 0 0 TTCTGCCCAGTGCTCTGAATGTCAAAG IIIIIIIIIIIIIIIIIIIIIIIIIII XM:i:0
## D00261:444:CBEUGANXX:3:1105:9313:2248 16 II 2384332 255 22M * 0 0 TTTACGACCACGACGCTTCCAC IIIIIIIIIIIIIIIIIIIIII XA:i:1 MD:Z:5A16 NM:i:1 XM:i:1
## D00261:444:CBEUGANXX:3:1208:12895:95475 16 I 553645 255 22M * 0 0 ACCCGACAGAGTCACATATCTC IIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:22 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:1203:1482:96832 16 I 464569 255 24M * 0 0 CCAGCTATTCCAATCCCAACCTTC IIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:24 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:1214:9191:67596 4 * 0 0 * * 0 0 AGAAGACCCTGTTGAGCTTGACTC IIIIIIIIIIIIIIIIIIIIIIII XM:i:0
## D00261:444:CBEUGANXX:3:2314:8667:62576 0 I 9467611 255 23M * 0 0 GTACCGACTTGCAAAGAATCGCC IIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:23 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:1203:14699:72177 0 I 5014506 255 20M * 0 0 GGCGATCGTGGATTTGGCGG IIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:20 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:2315:17872:58338 0 II 5699455 255 24M * 0 0 GTACTTTGAAGATCGCTCCTCCGA IIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:24 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:1105:14256:2373 0 I 9467129 255 23M * 0 0 GAAGATCGTGCCGTATGAGACGG IIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:23 NM:i:0 XM:i:1
## D00261:444:CBEUGANXX:3:2312:3892:98535 16 II 5328544 255 19M * 0 0 CTACACTTCAGCCGCCGGG IIIIIIIIIIIIIIIIIII XA:i:1 MD:Z:16T2 NM:i:1 XM:i:1
head Pure_EV_r1_100k.aligned.sam. Can you see
why?Although samtools view has many options, lets just use
it to convert to the compressed binary BAM format. We can then see that
the same view command still works.
samtools view -b Pure_EV_r1_100k.aligned.sam > Pure_EV_r1_100k.aligned.bam
ls -l Pure_EV_r1_100k.aligned.*## -rw-r--r--@ 1 cei staff 2993644 May 18 18:23 Pure_EV_r1_100k.aligned.bam
## -rw-r--r--@ 1 cei staff 10574594 May 18 18:23 Pure_EV_r1_100k.aligned.bwt
## -rw-r--r--@ 1 cei staff 14004776 May 18 18:23 Pure_EV_r1_100k.aligned.sam
As you can see above, the same file takes much less space in BAM format. In general you should try to avoid keeping SAM files: always convert them to BAM.
If we want to view the alignment on a genome browser such as IGV (see below) we first need to sort our alignment. This means sorting the reads according to their mapping position, and not according to their position in the raw sequencing file.
samtools sort Pure_EV_r1_100k.aligned.bam > Pure_EV_r1_100k.sorted.bam
samtools view Pure_EV_r1_100k.aligned.bam | cut -f 2-4 | head## 4 * 0
## 16 II 2384332
## 16 I 553645
## 16 I 464569
## 4 * 0
## 0 I 9467611
## 0 I 5014506
## 0 II 5699455
## 0 I 9467129
## 16 II 5328544
## 0 I 26488
## 0 I 26488
## 0 I 26634
## 0 I 26635
## 0 I 26665
## 0 I 26665
## 0 I 26665
## 0 I 26665
## 0 I 26665
## 0 I 26665
Some programs will need you to further index these
sorted BAM files, which is achieved as follows:
## -rw-r--r--@ 1 cei staff 1810811 May 18 18:23 Pure_EV_r1_100k.sorted.bam
## -rw-r--r--@ 1 cei staff 52080 May 18 18:23 Pure_EV_r1_100k.sorted.bam.bai
Several genome browsers can directly import alignments in sorted BAM format, so we can see where our aligned reads are piling up across the chromosomes. One example of such a browser is (IGV), which you can download and run locally for best performance and flexibility.
There is also a web version of the IGV app. I have inserted an instance of this below, having loaded the following files (which are just small variations of the files we have been using in this practical):
samtools faidx Hbakeri_small.fa):
If you want to use your own browser, with everything pre-loaded, you can also use this link
bedtools: a powerful toolset for genome arithmetic, including converting bam to different formats, generating coverage tracks, finding overlaps between reads and annotation, etc. https://bedtools.readthedocs.io/en/latest/index.html
I’ve used bedtools to produce separate coverage tracks
for reads that map to positive or negative strand. The below is part of
a bash script:
bedtools genomecov -ibam $bamFile -bg -strand '+' > $outBase.us
bedtools genomecov -ibam $bamFile -bg -strand '-' | sed -r 's/(\S+)$/-\1/' >> $outBase.us
bedtools sort -i $outBase.us | gzip -c > $outBase.bedGraph.gz
rm $outBase.usMore help on the genomecov command: https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html