Tools for processing fastq C-HiC files¶
Map and parser reads¶
hicup_tool¶
-
class
CHiC.tool.hicup_tool.
hicup
(configuration=None)[source]¶ Tool to run hicup, from fastq to bam files
-
digest_genome
(genome_name, re_enzyme, genome_loc, re_enzyme2)[source]¶ This function takes a genome and digest it using a restriction enzyme specified
Parameters: - genome_name (str) – name of the output genome
- re_enzyme (str) – name of the enzyme used to cut the genome format example A^GATCT,BglII .
- genome_loc (str) – location of the genome in FASTA format
- re_enzyme2 (str) – Restriction site 2 refers to the second, optional (other DNA shearing techniques such as sonication may be used) enzymatic digestion. This restriction site does NOT form a Hi-C ligation junction. This is the restriction enzyme that is used when the Hi-C sonication protocol is not followed. Typically the sonication protocol is followed.
-
static
get_hicup_params
(params)[source]¶ Function to handle to extraction of commandline parameters and formatting them for use with hicup
Parameters: params (dict) – --bowtie Specify the path to Bowtie --bowtie2 Specify the path to Bowtie 2 --config Specify the configuration file --digest Specify the digest file listing restriction fragment co-ordinates --example Produce an example configuration file --format Specify FASTQ format Options: Sanger, Solexa_Illumina_1.0, Illumina_1.3, Illumina_1.5 --help Print help message and exit --index Path to the relevant reference genome Bowtie/Bowtie2 indices --keep Keep intermediate pipeline files --longest Maximum allowable insert size (bps) --nofill Hi-C protocol did NOT include a fill-in of sticky ends prior to ligation step and therefore FASTQ reads shall be truncated at the Hi-C restriction enzyme cut site (if present) sequence is encountered --outdir Directory to write output files --quiet Suppress progress reports (except warnings) --shortest Minimum allowable insert size (bps) --temp Write intermediate files (i.e. all except summaryfiles and files generated by HiCUP Deduplicator) to a specified directory --threads Specify the number of threads, allowing simultaneous processing of multiple files --version Print the program version and exit --zip Compress output Returns: Return type: list
-
hicup_alig_filt
(**kwargs)[source]¶ This function aling the HiC read into a reference genome and filter them
Parameters: - bowtie2_loc –
- genome_index (str) – location of genome indexed with bowtie2
- digest_genome (str) – location of genome digested
- fastq1 (str) – location of fastq2 file
- fastq2 (str) – location of fastq2
Returns: Return type: Bool
-
run
(input_files, input_metadata, output_files)[source]¶ Function that runs and pass the parameters for all the functions
Parameters: - input_files (dict) –
- metadata (dict) –
- output_files (dict) –
-
untar_index
(**kwargs)[source]¶ Extracts the Bowtie2 index files from the genome index tar file. :param genome_file_name: Location string of the genome fasta file :type genome_file_name: str :param genome_idx: Location of the Bowtie2 index file :type genome_idx: str :param bt2_1_file: Location of the <genome>.1.bt2 index file :type bt2_1_file: str :param bt2_2_file: Location of the <genome>.2.bt2 index file :type bt2_2_file: str :param bt2_3_file: Location of the <genome>.3.bt2 index file :type bt2_3_file: str :param bt2_4_file: Location of the <genome>.4.bt2 index file :type bt2_4_file: str :param bt2_rev1_file: Location of the <genome>.rev.1.bt2 index file :type bt2_rev1_file: str :param bt2_rev2_file: Location of the <genome>.rev.2.bt2 index file :type bt2_rev2_file: str
Returns: Boolean indicating if the task was successful Return type: bool
-
Create CHiCAGO input files¶
makeRmap¶
makeBaitmap¶
makeDesignFiles¶
-
class
CHiC.tool.makeDesignFiles.
makeDesignFilesTool
(configuration=None)[source]¶ Tool for makeing the design files as part of the input for Chicago capture Hi-C
-
static
get_design_params
(params)[source]¶ This function handle chicago parameters, selecting the given ones and passing to the command line.
-
makeDesignFiles
(**kwargs)[source]¶ make the design files and store it in the specify design folder. It is a wrapper of makeDesignFiles.py
Parameters: - designDir (str,) – Path to the folder with the output files(recommended the same folder as .map and .baitmap files).
- parameters (dict,) – list of parameter already selected by get_makeDesignFiles_params().
Returns: - bool
- outFilePrefix (str) – writes the output files in the defined location
-
run
(input_files, input_metadata, output_files)[source]¶ The main function to run makeDesignFiles.
Parameters: - input_files (dict) – designDir : path to the designDir containin .rmap and .baitmap files
- input_metadata (dict) –
- output_files (dict) –
- outFilePrefix : path to the output folder and prefix name of files
- example: “/folder1/folder2/prefixname”. Recommended to use the path to designDir and the same prefix as .rmap and .baitmap
Returns: - output_files (dict) – List of location for the output files.
- output_metadata (dict) – List of matching metadata dict objects.
-
static
Convert bam files into chicago input¶
bam2chicago¶
-
class
CHiC.tool.bam2chicago_tool.
bam2chicagoTool
(configuration=None)[source]¶ Tool for preprocess the input files
-
bam2chicago
(**kwargs)[source]¶ Main function that preprocess the bam files into Chinput files. Part of the input files of CHiCAGO. It is a wrapper of bam2chicago.sh.
Parameters: - bamFile (str,) – path to paired-end file produced by a HiC aligner; Chicago has only been tested with data produced by HiCUP (http://www.bioinformatics.babraham.ac.uk/projects/hicup/). However, it should theoretically be possible to use other HiC aligners for this purpose.
- rmapFile (str,) – A tab-separated file of the format <chr> <start> <end> <numeric ID>, describing the restriction digest (or “virtual digest” if pooled fragments are used). These numeric IDs are referred to as “otherEndID” in Chicago. All fragments mapping outside of the digest coordinates will be disregarded by both these scripts and Chicago.
- baitMapFile (str,) – Tab-separated file of the format <chr> <start> <end> <numeric ID> <annotation>, listing the coordinates of the baited/captured restriction fragments (should be a subset of the fragments listed in rmapfile), their numeric IDs (should match those listed in rmapfile for the corresponding fragments) and their annotations (such as, for example, the names of baited promoters). The numeric IDs are referred to as “baitID” in Chicago.
- chinput (str) – name of the output file. Bbam2chicago creates a folder with the name of this sample, and inside the folder there is a file with chinput.chinput, that is the final output.
Returns: - bool
- chinput (str,) – name of the sample
-
run
(input_files, input_metadata, output_files)[source]¶ Function that runs and pass the parameters to bam2chicago
Parameters: - input_files (dict) –
- hicup_outdir_tar (str) –
- rmapFile (str) –
- baitmapFile (str) –
- metadata (dict) –
Returns: - output_files (list)
- List of locations for the output files.
- output_metadata (list)
- List of matching metadata dict objects
-
Normalize data and call C-HiC peaks¶
run_chicago¶
-
class
CHiC.tool.run_chicago.
ChicagoTool
(configuration=None)[source]¶ tool for running the CHiCAGO algorithm
-
chicago
(**kwargs)[source]¶ Run and annotate the Capture-HiC peaks. Chicago will create 4 folders under the outpu_prefix data : output_index.Rds –> chicago data saved on Rds format output_index_params.txt –> parameters used to run Chicago output_index.export_format –> chicago output in the chosen format diag_plots : 3 plots to assest the quality of the output (see CHicago Capture-HiC documentation for details) enrichment_data: files for the feature enrichment output (in case is used) examples: output_index_proxExamples.pdf: random chosen peaks showing interactions regions see http://regulatorygenomicsgroup.org/chicago for more information
Parameters: - input_files (str ot comma separated list if there is more than one replicate) –
- output_prefix (str) –
- output_dir (str (whole path for the output)) –
- params (dict) –
Returns: writes the output files in the defined location
Return type: bool
-
static
get_chicago_params
(params)[source]¶ Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN
Parameters: params (dict) – Returns: Return type: list
-
run
(input_files, input_metadata, output_files)[source]¶ The main function to run chicago for peak calling. The input files are .chinput and are transformed from BAM files using bam2chicago.sh input files could be just one file or a comma separated files from more than one biological replicate. Technical replicates should be pooled to one .chinput
Parameters: - input_files (dict) – list of .chinput files, or str with a single .chinput file
- input_metadata (dict) –
- output_files (dict with the output path) –
Returns: - output_files (Dict) – List of locations for the output files,
- output_metadata (Dict) – List of matching metadata dict objects
-