Tools for processing fastq C-HiC files¶

Map and parser reads¶

hicup_tool¶

class CHiC.tool.hicup_tool.hicup(configuration=None)[source]¶

Tool to run hicup, from fastq to bam files

digest_genome(genome_name, re_enzyme, genome_loc, re_enzyme2)[source]¶

This function takes a genome and digest it using a restriction enzyme specified

Parameters:

genome_name (str) – name of the output genome
re_enzyme (str) – name of the enzyme used to cut the genome format example A^GATCT,BglII .
genome_loc (str) – location of the genome in FASTA format
re_enzyme2 (str) – Restriction site 2 refers to the second, optional (other DNA shearing techniques such as sonication may be used) enzymatic digestion. This restriction site does NOT form a Hi-C ligation junction. This is the restriction enzyme that is used when the Hi-C sonication protocol is not followed. Typically the sonication protocol is followed.

static get_hicup_params(params)[source]¶

Function to handle to extraction of commandline parameters and formatting them for use with hicup

Parameters:

params (dict) –

`--bowtie`	Specify the path to Bowtie
`--bowtie2`	Specify the path to Bowtie 2
`--config`	Specify the configuration file
`--digest`	Specify the digest file listing restriction fragment co-ordinates
`--example`	Produce an example configuration file
`--format`	Specify FASTQ format Options: Sanger, Solexa_Illumina_1.0, Illumina_1.3, Illumina_1.5
`--help`	Print help message and exit
`--index`	Path to the relevant reference genome Bowtie/Bowtie2 indices
`--keep`	Keep intermediate pipeline files
`--longest`	Maximum allowable insert size (bps)
`--nofill`	Hi-C protocol did NOT include a fill-in of sticky ends prior to ligation step and therefore FASTQ reads shall be truncated at the Hi-C restriction enzyme cut site (if present) sequence is encountered
`--outdir`	Directory to write output files
`--quiet`	Suppress progress reports (except warnings)
`--shortest`	Minimum allowable insert size (bps)
`--temp`	Write intermediate files (i.e. all except summaryfiles and files generated by HiCUP Deduplicator) to a specified directory
`--threads`	Specify the number of threads, allowing simultaneous processing of multiple files
`--version`	Print the program version and exit
`--zip`	Compress output

Returns:

Return type: list

hicup_alig_filt(params, genome_digest, genome_index, genome_loc, fastq1, fastq2, outdir_tar)[source]¶

This function aling the HiC read into a reference genome and filter them

Parameters:	bowtie2_loc – genome_index (str) – location of genome indexed with bowtie2 digest_genome (str) – location of genome digested fastq1 (str) – location of fastq2 file fastq2 (str) – location of fastq2
Returns:
Return type:	Bool

hicup_alig_filt_runner(**kwargs)[source]¶

This function runs the hicup_alig_filt

Parameters:	bowtie2_loc – genome_index (str) – location of genome indexed with bowtie2 digest_genome (str) – location of genome digested fastq1 (str) – location of fastq2 file fastq2 (str) – location of fastq2
Returns:
Return type:	Bool

run(input_files, metadata, output_files)[source]¶

Function that runs and pass the parameters for all the functions

Parameters:	input_files (dict) – metadata (dict) – output_files (dict) –

untar_index(genome_file_name, genome_idx, bt2_1_file, bt2_2_file, bt2_3_file, bt2_4_file, bt2_rev1_file, bt2_rev2_file)[source]¶

Extracts the Bowtie2 index files from the genome index tar file. :param genome_file_name: Location string of the genome fasta file :type genome_file_name: str :param genome_idx: Location of the Bowtie2 index file :type genome_idx: str :param bt2_1_file: Location of the <genome>.1.bt2 index file :type bt2_1_file: str :param bt2_2_file: Location of the <genome>.2.bt2 index file :type bt2_2_file: str :param bt2_3_file: Location of the <genome>.3.bt2 index file :type bt2_3_file: str :param bt2_4_file: Location of the <genome>.4.bt2 index file :type bt2_4_file: str :param bt2_rev1_file: Location of the <genome>.rev.1.bt2 index file :type bt2_rev1_file: str :param bt2_rev2_file: Location of the <genome>.rev.2.bt2 index file :type bt2_rev2_file: str

Returns:	Boolean indicating if the task was successful
Return type:	bool

Create CHiCAGO input files¶

makeRmap¶

makeBaitmap¶

makeDesignFiles¶

class CHiC.tool.makeDesignFiles.makeDesignFilesTool(configuration=None)[source]¶

Tool for makeing the design files as part of the input for Chicago capture Hi-C

static get_design_params(params)[source]¶: This function handle chicago parameters, selecting the given ones and passing to the command line.

makeDesignFiles(**kwargs)[source]¶

make the design files and store it in the specify design folder. It is a wrapper of makeDesignFiles.py

Parameters:

designDir (str,) – Path to the folder with the output files(recommended the same folder as .map and .baitmap files).
parameters (dict,) – list of parameter already selected by get_makeDesignFiles_params().

Returns:

bool
outFilePrefix (str) – writes the output files in the defined location

run(input_files, input_metadata, output_files)[source]¶

The main function to run makeDesignFiles.

Parameters:

input_files (dict) – designDir : path to the designDir containin .rmap and .baitmap files
input_metadata (dict) –
output_files (dict) –

outFilePrefix : path to the output folder and prefix name of files

example: “/folder1/folder2/prefixname”. Recommended to use the path to designDir and the same prefix as .rmap and .baitmap

Returns:

output_files (dict) – List of location for the output files.
output_metadata (dict) – List of matching metadata dict objects.

Convert bam files into chicago input¶

bam2chicago¶

Normalize data and call C-HiC peaks¶

run_chicago¶

class CHiC.tool.run_chicago.ChicagoTool(configuration=None)[source]¶

tool for running the CHiCAGO algorithm

chicago(**kwargs)[source]¶

Run and annotate the Capture-HiC peaks. Chicago will create 4 folders under the outpu_prefix data : output_index.Rds –> chicago data saved on Rds format output_index_params.txt –> parameters used to run Chicago output_index.export_format –> chicago output in the chosen format diag_plots : 3 plots to assest the quality of the output (see CHicago Capture-HiC documentation for details) enrichment_data: files for the feature enrichment output (in case is used) examples: output_index_proxExamples.pdf: random chosen peaks showing interactions regions see http://regulatorygenomicsgroup.org/chicago for more information

Parameters:	input_files (str ot comma separated list if there is more than one replicate) – output_prefix (str) – output_dir (str (whole path for the output)) – params (dict) –
Returns:	writes the output files in the defined location
Return type:	bool

static get_chicago_params(params)[source]¶

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN

Parameters:	params (dict) –
Returns:
Return type:	list

run(input_files, input_metadata, output_files)[source]¶

The main function to run chicago for peak calling. The input files are .chinput and are transformed from BAM files using bam2chicago.sh input files could be just one file or a comma separated files from more than one biological replicate. Technical replicates should be pooled to one .chinput

Parameters:

input_files (dict) – list of .chinput files, or str with a single .chinput file
input_metadata (dict) –
output_files (dict with the output path) –

Returns:

output_files (Dict) – List of locations for the output files,
output_metadata (Dict) – List of matching metadata dict objects

static untar_chinput(chinput_tar)[source]¶

This function take as input the tar chinput

Parameters:	chinput_tar (str) – path to the tar file, the tar files should have the same prefix name as the tar file
Returns:
Return type:	list of untar files