Tools for processing fastq C-HiC files¶
Map and parser reads¶
hicup_tool¶
-
class
CHiC.tool.hicup_tool.
hicup
(configuration=None)[source]¶ Tool to run hicup, from fastq to bam files
-
digest_genome
(genome_name, re_enzyme, genome_loc, re_enzyme2)[source]¶ This function takes a genome and digest it using a restriction enzyme specified
Parameters: - genome_name (str) – name of the output genome
- re_enzyme (str) – name of the enzyme used to cut the genome format example A^GATCT,BglII .
- genome_loc (str) – location of the genome in FASTA format
- re_enzyme2 (str) – Restriction site 2 refers to the second, optional (other DNA shearing techniques such as sonication may be used) enzymatic digestion. This restriction site does NOT form a Hi-C ligation junction. This is the restriction enzyme that is used when the Hi-C sonication protocol is not followed. Typically the sonication protocol is followed.
-
static
get_hicup_params
(params)[source]¶ Function to handle to extraction of commandline parameters and formatting them for use with hicup
Parameters: params (dict) – --bowtie Specify the path to Bowtie --bowtie2 Specify the path to Bowtie 2 --config Specify the configuration file --digest Specify the digest file listing restriction fragment co-ordinates --example Produce an example configuration file --format Specify FASTQ format Options: Sanger, Solexa_Illumina_1.0, Illumina_1.3, Illumina_1.5 --help Print help message and exit --index Path to the relevant reference genome Bowtie/Bowtie2 indices --keep Keep intermediate pipeline files --longest Maximum allowable insert size (bps) --nofill Hi-C protocol did NOT include a fill-in of sticky ends prior to ligation step and therefore FASTQ reads shall be truncated at the Hi-C restriction enzyme cut site (if present) sequence is encountered --outdir Directory to write output files --quiet Suppress progress reports (except warnings) --shortest Minimum allowable insert size (bps) --temp Write intermediate files (i.e. all except summaryfiles and files generated by HiCUP Deduplicator) to a specified directory --threads Specify the number of threads, allowing simultaneous processing of multiple files --version Print the program version and exit --zip Compress output Returns: Return type: list
-
hicup_alig_filt
(params, genome_digest, genome_index, genome_loc, fastq1, fastq2, outdir_tar)[source]¶ This function aling the HiC read into a reference genome and filter them
Parameters: - bowtie2_loc –
- genome_index (str) – location of genome indexed with bowtie2
- digest_genome (str) – location of genome digested
- fastq1 (str) – location of fastq2 file
- fastq2 (str) – location of fastq2
Returns: Return type: Bool
-
hicup_alig_filt_runner
(**kwargs)[source]¶ This function runs the hicup_alig_filt
Parameters: - bowtie2_loc –
- genome_index (str) – location of genome indexed with bowtie2
- digest_genome (str) – location of genome digested
- fastq1 (str) – location of fastq2 file
- fastq2 (str) – location of fastq2
Returns: Return type: Bool
-
run
(input_files, metadata, output_files)[source]¶ Function that runs and pass the parameters for all the functions
Parameters: - input_files (dict) –
- metadata (dict) –
- output_files (dict) –
-
untar_index
(genome_file_name, genome_idx, bt2_1_file, bt2_2_file, bt2_3_file, bt2_4_file, bt2_rev1_file, bt2_rev2_file)[source]¶ Extracts the Bowtie2 index files from the genome index tar file. :param genome_file_name: Location string of the genome fasta file :type genome_file_name: str :param genome_idx: Location of the Bowtie2 index file :type genome_idx: str :param bt2_1_file: Location of the <genome>.1.bt2 index file :type bt2_1_file: str :param bt2_2_file: Location of the <genome>.2.bt2 index file :type bt2_2_file: str :param bt2_3_file: Location of the <genome>.3.bt2 index file :type bt2_3_file: str :param bt2_4_file: Location of the <genome>.4.bt2 index file :type bt2_4_file: str :param bt2_rev1_file: Location of the <genome>.rev.1.bt2 index file :type bt2_rev1_file: str :param bt2_rev2_file: Location of the <genome>.rev.2.bt2 index file :type bt2_rev2_file: str
Returns: Boolean indicating if the task was successful Return type: bool
-
Create CHiCAGO input files¶
makeRmap¶
makeBaitmap¶
makeDesignFiles¶
-
class
CHiC.tool.makeDesignFiles.
makeDesignFilesTool
(configuration=None)[source]¶ Tool for makeing the design files as part of the input for Chicago capture Hi-C
-
static
get_design_params
(params)[source]¶ This function handle chicago parameters, selecting the given ones and passing to the command line.
-
makeDesignFiles
(**kwargs)[source]¶ make the design files and store it in the specify design folder. It is a wrapper of makeDesignFiles.py
Parameters: - designDir (str,) – Path to the folder with the output files(recommended the same folder as .map and .baitmap files).
- parameters (dict,) – list of parameter already selected by get_makeDesignFiles_params().
Returns: - bool
- outFilePrefix (str) – writes the output files in the defined location
-
run
(input_files, input_metadata, output_files)[source]¶ The main function to run makeDesignFiles.
Parameters: - input_files (dict) – designDir : path to the designDir containin .rmap and .baitmap files
- input_metadata (dict) –
- output_files (dict) –
- outFilePrefix : path to the output folder and prefix name of files
- example: “/folder1/folder2/prefixname”. Recommended to use the path to designDir and the same prefix as .rmap and .baitmap
Returns: - output_files (dict) – List of location for the output files.
- output_metadata (dict) – List of matching metadata dict objects.
-
static
Normalize data and call C-HiC peaks¶
run_chicago¶
-
class
CHiC.tool.run_chicago.
ChicagoTool
(configuration=None)[source]¶ tool for running the CHiCAGO algorithm
-
chicago
(**kwargs)[source]¶ Run and annotate the Capture-HiC peaks. Chicago will create 4 folders under the outpu_prefix data : output_index.Rds –> chicago data saved on Rds format output_index_params.txt –> parameters used to run Chicago output_index.export_format –> chicago output in the chosen format diag_plots : 3 plots to assest the quality of the output (see CHicago Capture-HiC documentation for details) enrichment_data: files for the feature enrichment output (in case is used) examples: output_index_proxExamples.pdf: random chosen peaks showing interactions regions see http://regulatorygenomicsgroup.org/chicago for more information
Parameters: - input_files (str ot comma separated list if there is more than one replicate) –
- output_prefix (str) –
- output_dir (str (whole path for the output)) –
- params (dict) –
Returns: writes the output files in the defined location
Return type: bool
-
static
get_chicago_params
(params)[source]¶ Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN
Parameters: params (dict) – Returns: Return type: list
-
run
(input_files, input_metadata, output_files)[source]¶ The main function to run chicago for peak calling. The input files are .chinput and are transformed from BAM files using bam2chicago.sh input files could be just one file or a comma separated files from more than one biological replicate. Technical replicates should be pooled to one .chinput
Parameters: - input_files (dict) – list of .chinput files, or str with a single .chinput file
- input_metadata (dict) –
- output_files (dict with the output path) –
Returns: - output_files (Dict) – List of locations for the output files,
- output_metadata (Dict) – List of matching metadata dict objects
-