Pipelines¶
Map and parse CHi-C reads¶
This pipeline will take as input two fastq files, RE sites, the genome indexed with GEM and the same genome in FASTA file. This pipeline uses TADbit to map, filter and produce a bed file that will be used later on to produce bam file compatible with CHiCAGO algorithm. More information about filtering and mapping https://3dgenomes.github.io/TADbit/
Running from the command line¶
Parameters¶
- config : str
- Configuration JSON file
- in_metadata : str
- Location of input JSON metadata for files
- out_metadata : str
- Location of output JSON metadata for files
Returns¶
- Wd : folders and files
- path to the working directory where the output files are
Example¶
REQUIREMENT - Needs two fastq files single end, FASTA genome and bowtie2 indexed genome.
When running the pipeline on a local machine without COMPSs:
1 2 3 4 5 | python process_hicup.py \
--config tests/json/config_hicup.json \
--in_metadata tests/json/input_hicup.json \
--out_metadata tests/json/output_hicup.json \
--local
|
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):
1 2 3 4 5 6 7 8 9 | runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_fastq2bed.py \
--config tests/json/config_hicup.json \
--in_metadata tests/json/input_hicup.json \
--out_metadata tests/json/output_hicup.json
|
Create CHiCAGO input RMAP¶
Create CHiCAGO input BAITMAP¶
Create CHiCAGO input Design files¶
This script use as input .rmap and .baitmap files and generate the Design files. NPerBin file (.npb): <baitID> <Total no. valid restriction fragments in distance bin 1> … <Total no. valid restriction fragments in distance bin N>, where the bins map within the “proximal” distance range from each bait (0; maxLBrownEst] and bin size is defined by the binsize parameter. NBaitsPerBin file (.nbpb): <otherEndID> <Total no. valid baits in distance bin 1> … <Total no. valid baits in distance bin N>, where the bins map within the “proximal” distance range from each other end (0; maxLBrownEst] and bin size is defined by the binsize parameter. Proximal Other End (ProxOE) file (.poe): <baitID> <otherEndID> <absolute distance> for all combinations of baits and other ends that map within the “proximal” distance range from each other (0; maxLBrownEst]. Data in each file is preceded by a comment line listing the input parameters used to generate them.
Running from the command line¶
Parameters¶
- config : str
- Configuration JSON file
- in_metadata : str
- Location of input JSON metadata for files
- out_metadata : str
- Location of output JSON metadata for files
Returns¶
“nbpb” : .nbpb file “npb” : .npb file “poe” : .poe file
Example¶
REQUIREMENT - Needs RMAP and BAITMAP files
When running the pipeline on a local machine without COMPSs:
1 2 3 4 5 | python process_design.py \
--config tests/json/config_design.json \
--in_metadata tests/json/input_design.json \
--out_metadata tests/json/output_design.json \
--local
|
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):
1 2 3 4 5 6 7 8 9 | runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_design.py \
--config tests/json/config_design.json \
--in_metadata tests/json/input_design.json \
--out_metadata tests/json/output_design.json
|
Methods¶
-
class
process_design.
process_design
(configuration=None)[source]¶ This class generates the Design files and chinput files, imput for CHiCAGO. Starting from rmap and baitmap and capture HiC BAM files.
-
run
(input_files, metadata, output_files)[source]¶ Main function to run the tools, MakeDesignFiles_Tool.py and bam2chicago_Tool.py
Parameters: - input_files (dict) – designDir: path to the folder with .rmap and .baitmap files rmapFile: path to the .rmap file baitmapFile: path to the .baitmap file bamFile: path to the capture HiC bamfiles
- metadata (dict) – input metadata
- output_files (dict) – outPrefixDesign : Path and name of the output prefix, recommend to be the same as rmap and baitmap files. sample_name: Path and name of the .chinput file
Returns: - bool
- output_metadata
-
Convert BAM file into chicago input files .chinput¶
Data normalization and peak calling¶
This pipeline runs the normalization of the data and call the real chomatine interactions
Running from the command line¶
Parameters¶
- config : str
- Configuration JSON file
- in_metadata : str
- Location of input JSON metadata for files
- out_metadata : str
- Location of output JSON metadata for files
Returns¶
output_dir: directory with all output folders and files
Example¶
- REQUIREMENT - Needs a reference genome
- Needs file with the capture sequences with FASTA format
- settings file
- design dir:
- .rmap .baitmap .npb .nbpb .poe
When running the pipeline on a local machine without COMPSs:
1 2 3 4 5 | python process_run_chicago.py \
--config tests/json/config_chicago.json \
--in_metadata tests/json/input_chicago.json \
--out_metadata tests/json/output_chicago.json \
--local
|
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/ software-and-apps/software-list/comp-superscalar/):
1 2 3 4 5 6 7 8 9 | runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_runChicago.py \
--config tests/json/config_chicago.json \
--in_metadata tests/json/input_chicago.json \
--out_metadata tests/json/output_chicago.json
|
Methods¶
-
class
process_run_chicago.
process_run_chicago
(configuration=None)[source]¶ Function for processing capture Hi-C fastq files. Files are aligned, filtered and analysed for Cpature Hi-C peaks
-
run
(input_files, metadata, output_files)[source]¶ This main function that run the chicago pipeline with runChicago.R wrapper
Parameters: - input_files (dict) – location with the .chinput files. chinput_file: str in case there is one input file chinput_file: comma separated list in case there is more than one input file.
- metadata (dict) – Input metadata, str
- output (dict) – output file locations
Returns: - output_files (dict) – Folder location with the output files
- output_metadata (dict) – Output metadata for the associated files in output_files
-