Tools for processing fastq C-HiC files

Map and parser reads

hicup_tool

class CHiC.tool.hicup_tool.hicup(configuration=None)[source]

Tool to run hicup, from fastq to bam files

digest_genome(genome_name, re_enzyme, genome_loc, re_enzyme2)[source]

This function takes a genome and digest it using a restriction enzyme specified

Parameters:
  • genome_name (str) – name of the output genome
  • re_enzyme (str) – name of the enzyme used to cut the genome format example A^GATCT,BglII .
  • genome_loc (str) – location of the genome in FASTA format
  • re_enzyme2 (str) – Restriction site 2 refers to the second, optional (other DNA shearing techniques such as sonication may be used) enzymatic digestion. This restriction site does NOT form a Hi-C ligation junction. This is the restriction enzyme that is used when the Hi-C sonication protocol is not followed. Typically the sonication protocol is followed.
static get_hicup_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use with hicup

Parameters:params (dict) –
--bowtie Specify the path to Bowtie
--bowtie2 Specify the path to Bowtie 2
--config Specify the configuration file
--digest Specify the digest file listing restriction fragment co-ordinates
--example Produce an example configuration file
--format Specify FASTQ format Options: Sanger, Solexa_Illumina_1.0, Illumina_1.3, Illumina_1.5
--help Print help message and exit
--index Path to the relevant reference genome Bowtie/Bowtie2 indices
--keep Keep intermediate pipeline files
--longest Maximum allowable insert size (bps)
--nofill Hi-C protocol did NOT include a fill-in of sticky ends prior to ligation step and therefore FASTQ reads shall be truncated at the Hi-C restriction enzyme cut site (if present) sequence is encountered
--outdir Directory to write output files
--quiet Suppress progress reports (except warnings)
--shortest Minimum allowable insert size (bps)
--temp Write intermediate files (i.e. all except summaryfiles and files generated by HiCUP Deduplicator) to a specified directory
--threads Specify the number of threads, allowing simultaneous processing of multiple files
--version Print the program version and exit
--zip Compress output
Returns:
Return type:list
hicup_alig_filt(params, genome_digest, genome_index, genome_loc, fastq1, fastq2, outdir_tar)[source]

This function aling the HiC read into a reference genome and filter them

Parameters:
  • bowtie2_loc
  • genome_index (str) – location of genome indexed with bowtie2
  • digest_genome (str) – location of genome digested
  • fastq1 (str) – location of fastq2 file
  • fastq2 (str) – location of fastq2
Returns:

Return type:

Bool

hicup_alig_filt_runner(**kwargs)[source]

This function runs the hicup_alig_filt

Parameters:
  • bowtie2_loc
  • genome_index (str) – location of genome indexed with bowtie2
  • digest_genome (str) – location of genome digested
  • fastq1 (str) – location of fastq2 file
  • fastq2 (str) – location of fastq2
Returns:

Return type:

Bool

run(input_files, metadata, output_files)[source]

Function that runs and pass the parameters for all the functions

Parameters:
  • input_files (dict) –
  • metadata (dict) –
  • output_files (dict) –
untar_index(genome_file_name, genome_idx, bt2_1_file, bt2_2_file, bt2_3_file, bt2_4_file, bt2_rev1_file, bt2_rev2_file)[source]

Extracts the Bowtie2 index files from the genome index tar file. :param genome_file_name: Location string of the genome fasta file :type genome_file_name: str :param genome_idx: Location of the Bowtie2 index file :type genome_idx: str :param bt2_1_file: Location of the <genome>.1.bt2 index file :type bt2_1_file: str :param bt2_2_file: Location of the <genome>.2.bt2 index file :type bt2_2_file: str :param bt2_3_file: Location of the <genome>.3.bt2 index file :type bt2_3_file: str :param bt2_4_file: Location of the <genome>.4.bt2 index file :type bt2_4_file: str :param bt2_rev1_file: Location of the <genome>.rev.1.bt2 index file :type bt2_rev1_file: str :param bt2_rev2_file: Location of the <genome>.rev.2.bt2 index file :type bt2_rev2_file: str

Returns:Boolean indicating if the task was successful
Return type:bool

Create CHiCAGO input files

makeRmap

makeBaitmap

makeDesignFiles

class CHiC.tool.makeDesignFiles.makeDesignFilesTool(configuration=None)[source]

Tool for makeing the design files as part of the input for Chicago capture Hi-C

static get_design_params(params)[source]

This function handle chicago parameters, selecting the given ones and passing to the command line.

makeDesignFiles(**kwargs)[source]

make the design files and store it in the specify design folder. It is a wrapper of makeDesignFiles.py

Parameters:
  • designDir (str,) – Path to the folder with the output files(recommended the same folder as .map and .baitmap files).
  • parameters (dict,) – list of parameter already selected by get_makeDesignFiles_params().
Returns:

  • bool
  • outFilePrefix (str) – writes the output files in the defined location

run(input_files, input_metadata, output_files)[source]

The main function to run makeDesignFiles.

Parameters:
  • input_files (dict) – designDir : path to the designDir containin .rmap and .baitmap files
  • input_metadata (dict) –
  • output_files (dict) –
    outFilePrefix : path to the output folder and prefix name of files
    example: “/folder1/folder2/prefixname”. Recommended to use the path to designDir and the same prefix as .rmap and .baitmap
Returns:

  • output_files (dict) – List of location for the output files.
  • output_metadata (dict) – List of matching metadata dict objects.

Convert bam files into chicago input

bam2chicago

Normalize data and call C-HiC peaks

run_chicago

class CHiC.tool.run_chicago.ChicagoTool(configuration=None)[source]

tool for running the CHiCAGO algorithm

chicago(**kwargs)[source]

Run and annotate the Capture-HiC peaks. Chicago will create 4 folders under the outpu_prefix data : output_index.Rds –> chicago data saved on Rds format output_index_params.txt –> parameters used to run Chicago output_index.export_format –> chicago output in the chosen format diag_plots : 3 plots to assest the quality of the output (see CHicago Capture-HiC documentation for details) enrichment_data: files for the feature enrichment output (in case is used) examples: output_index_proxExamples.pdf: random chosen peaks showing interactions regions see http://regulatorygenomicsgroup.org/chicago for more information

Parameters:
  • input_files (str ot comma separated list if there is more than one replicate) –
  • output_prefix (str) –
  • output_dir (str (whole path for the output)) –
  • params (dict) –
Returns:

writes the output files in the defined location

Return type:

bool

static get_chicago_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN

Parameters:params (dict) –
Returns:
Return type:list
run(input_files, input_metadata, output_files)[source]

The main function to run chicago for peak calling. The input files are .chinput and are transformed from BAM files using bam2chicago.sh input files could be just one file or a comma separated files from more than one biological replicate. Technical replicates should be pooled to one .chinput

Parameters:
  • input_files (dict) – list of .chinput files, or str with a single .chinput file
  • input_metadata (dict) –
  • output_files (dict with the output path) –
Returns:

  • output_files (Dict) – List of locations for the output files,
  • output_metadata (Dict) – List of matching metadata dict objects

static untar_chinput(chinput_tar)[source]

This function take as input the tar chinput

Parameters:chinput_tar (str) – path to the tar file, the tar files should have the same prefix name as the tar file
Returns:
Return type:list of untar files