Tools

bam_utils

class mg_common.tool.bam_utils.bamUtils[source]

Tool for handling bam files

static bam_copy(bam_in, bam_out)[source]

Wrapper function to copy from one bam file to another

Parameters:
  • bam_in (str) – Location of the input bam file
  • bam_out (str) – Location of the output bam file
static bam_count_reads(bam_file, aligned=False)[source]

Wrapper to count the number of (aligned) reads in a bam file

static bam_filter(bam_file, bam_file_out, filter_name)[source]

Wrapper for filtering out reads from a bam file

Parameters:
  • bam_file (str) –
  • bam_file_out (str) –
  • filter (str) –
    One of:
    duplicate - Read is PCR or optical duplicate (1024) unmapped - Read is unmapped or not the primary alignment (260)
static bam_index(bam_file, bam_idx_file)[source]

Wrapper for the pysam SAMtools index function

Parameters:
  • bam_file (str) – Location of the bam file that is to be indexed
  • bam_idx_file (str) – Location of the bam index file (.bai)
static bam_list_chromosomes(bam_file)[source]

Wrapper to list the chromosome names that are present within the bam file

Parameters:bam_file (str) – Location of the bam file
Returns:List of the names of the chromosomes that are present in the bam file
Return type:list
static bam_merge(*args)[source]

Wrapper for the pysam SAMtools merge function

Parameters:
  • bam_file_1 (str) – Location of the bam file to merge into
  • bam_file_2 (str) – Location of the bam file that is to get merged into bam_file_1
static bam_paired_reads(bam_file)[source]

Wrapper to test if a bam file contains paired end reads

static bam_sort(bam_file)[source]

Wrapper for the pysam SAMtools sort function

Parameters:bam_file (str) – Location of the bam file to sort
static bam_split(bam_file_in, bai_file, chromosome, bam_file_out)[source]

Wrapper to extract a single chromosomes worth of reading into a new bam file

Parameters:
  • bam_file_in (str) – Location of the input bam file
  • bai_file (str) – Location of the bam index file. This needs to be in the same directory as the bam_file_in
  • chromosome (str) – Name of the chromosome whose alignments are to be extracted
  • bam_file_out (str) – Location of the output bam file
static bam_stats(bam_file)[source]

Wrapper for the pysam SAMtools flagstat function

Parameters:bam_file (str) – Location of the bam file
Returns:list – qc_passed : int qc_failed : int description : str
Return type:dict
static check_header(bam_file)[source]

Wrapper for the pysam SAMtools for checking if a bam file is sorted

Parameters:bool – True if the file has been sorted
static sam_to_bam(sam_file, bam_file)[source]

Function for converting sam files to bam files

@Task Helper Functions

The following are helper functions for the bam_utils so that the functions can operate on tasks where the files are in COMPSs, but have not been retirned to the users workspace.

class mg_common.tool.bam_utils.bamUtils[source]

Tool for handling bam files

static bam_copy(bam_in, bam_out)[source]

Wrapper function to copy from one bam file to another

Parameters:
  • bam_in (str) – Location of the input bam file
  • bam_out (str) – Location of the output bam file
static bam_count_reads(bam_file, aligned=False)[source]

Wrapper to count the number of (aligned) reads in a bam file

static bam_filter(bam_file, bam_file_out, filter_name)[source]

Wrapper for filtering out reads from a bam file

Parameters:
  • bam_file (str) –
  • bam_file_out (str) –
  • filter (str) –
    One of:
    duplicate - Read is PCR or optical duplicate (1024) unmapped - Read is unmapped or not the primary alignment (260)
static bam_index(bam_file, bam_idx_file)[source]

Wrapper for the pysam SAMtools index function

Parameters:
  • bam_file (str) – Location of the bam file that is to be indexed
  • bam_idx_file (str) – Location of the bam index file (.bai)
static bam_list_chromosomes(bam_file)[source]

Wrapper to list the chromosome names that are present within the bam file

Parameters:bam_file (str) – Location of the bam file
Returns:List of the names of the chromosomes that are present in the bam file
Return type:list
static bam_merge(*args)[source]

Wrapper for the pysam SAMtools merge function

Parameters:
  • bam_file_1 (str) – Location of the bam file to merge into
  • bam_file_2 (str) – Location of the bam file that is to get merged into bam_file_1
static bam_paired_reads(bam_file)[source]

Wrapper to test if a bam file contains paired end reads

static bam_sort(bam_file)[source]

Wrapper for the pysam SAMtools sort function

Parameters:bam_file (str) – Location of the bam file to sort
static bam_split(bam_file_in, bai_file, chromosome, bam_file_out)[source]

Wrapper to extract a single chromosomes worth of reading into a new bam file

Parameters:
  • bam_file_in (str) – Location of the input bam file
  • bai_file (str) – Location of the bam index file. This needs to be in the same directory as the bam_file_in
  • chromosome (str) – Name of the chromosome whose alignments are to be extracted
  • bam_file_out (str) – Location of the output bam file
static bam_stats(bam_file)[source]

Wrapper for the pysam SAMtools flagstat function

Parameters:bam_file (str) – Location of the bam file
Returns:list – qc_passed : int qc_failed : int description : str
Return type:dict
static check_header(bam_file)[source]

Wrapper for the pysam SAMtools for checking if a bam file is sorted

Parameters:bool – True if the file has been sorted
static sam_to_bam(sam_file, bam_file)[source]

Function for converting sam files to bam files

common

class mg_common.tool.common.common[source]

Common functions that can be used generically across tools and pipelines

static to_output_file(input_file, output_file, empty=True)[source]

When handling the output of files within the @task function copying the results into the correct output files should be done by reading from and writing to rather than renaming.

In cases where there are a known set of output files, if the input file is missing then a blank file should be created and handled by the run() function of the tool. If an empty file should not be created then the empty parameter should be set to False.

Parameters:
  • input_file (str) – Location of the input file
  • output_file (str) – Location of the output file
  • empty (bool) – In cases where the input_file is missing an empty output_file is created. Should be set to False if no file shold be created.
static zip_file(location)[source]

Use pigz (gzip as a fallback) to compress a file

Parameters:location (str) – Location of the file to be zipped