Pipelines¶

Map and parse CHi-C reads¶

This pipeline will take as input two fastq files, RE sites, the genome indexed with GEM and the same genome in FASTA file. This pipeline uses TADbit to map, filter and produce a bed file that will be used later on to produce bam file compatible with CHiCAGO algorithm. More information about filtering and mapping https://3dgenomes.github.io/TADbit/

Running from the command line¶

Parameters¶

config : str: Configuration JSON file
in_metadata : str: Location of input JSON metadata for files
out_metadata : str: Location of output JSON metadata for files

Returns¶

Wd : folders and files: path to the working directory where the output files are

Example¶

REQUIREMENT - Needs two fastq files single end, FASTA genome and bowtie2 indexed genome.

When running the pipeline on a local machine without COMPSs:

python process_hicup.py \
   --config tests/json/config_hicup.json \
   --in_metadata tests/json/input_hicup.json \
   --out_metadata tests/json/output_hicup.json \
   --local

When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):

runcompss                     \
   --lang=python              \
   --library_path=${HOME}/bin \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug          \
   process_fastq2bed.py         \
      --config tests/json/config_hicup.json \
      --in_metadata tests/json/input_hicup.json \
      --out_metadata tests/json/output_hicup.json

Methods¶

class process_hicup.process_hicup(configuration=None)[source]¶

This class run hicup tool which run hicup, doing the alignment and filtering of the reads and convert them into a BAM file.

run(input_files, metadata, output_files)[source]¶

This is the main function that runs

Parameters:

input_files (dict) – fastq1 fastq2
metadata (dict) –
output_files (dict) –

out_dir: str

directory to write the output

Returns:

results (bool)
output_metadata (dict)

Create CHiCAGO input RMAP¶

Create CHiCAGO input BAITMAP¶

Create CHiCAGO input Design files¶

This script use as input .rmap and .baitmap files and generate the Design files. NPerBin file (.npb): <baitID> <Total no. valid restriction fragments in distance bin 1> … <Total no. valid restriction fragments in distance bin N>, where the bins map within the “proximal” distance range from each bait (0; maxLBrownEst] and bin size is defined by the binsize parameter. NBaitsPerBin file (.nbpb): <otherEndID> <Total no. valid baits in distance bin 1> … <Total no. valid baits in distance bin N>, where the bins map within the “proximal” distance range from each other end (0; maxLBrownEst] and bin size is defined by the binsize parameter. Proximal Other End (ProxOE) file (.poe): <baitID> <otherEndID> <absolute distance> for all combinations of baits and other ends that map within the “proximal” distance range from each other (0; maxLBrownEst]. Data in each file is preceded by a comment line listing the input parameters used to generate them.

Running from the command line¶

Parameters¶

config : str: Configuration JSON file
in_metadata : str: Location of input JSON metadata for files
out_metadata : str: Location of output JSON metadata for files

Returns¶

“nbpb” : .nbpb file “npb” : .npb file “poe” : .poe file

Example¶

REQUIREMENT - Needs RMAP and BAITMAP files

When running the pipeline on a local machine without COMPSs:

python process_design.py \
   --config tests/json/config_design.json \
   --in_metadata tests/json/input_design.json \
   --out_metadata tests/json/output_design.json \
   --local

When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):

runcompss                     \
   --lang=python              \
   --library_path=${HOME}/bin \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug          \
   process_design.py         \
      --config tests/json/config_design.json \
      --in_metadata tests/json/input_design.json \
      --out_metadata tests/json/output_design.json

Methods¶

class process_design.process_design(configuration=None)[source]¶

This class generates the Design files and chinput files, imput for CHiCAGO. Starting from rmap and baitmap and capture HiC BAM files.

run(input_files, metadata, output_files)[source]¶

Main function to run the tools, MakeDesignFiles_Tool.py and bam2chicago_Tool.py

Parameters:

input_files (dict) – designDir: path to the folder with .rmap and .baitmap files rmapFile: path to the .rmap file baitmapFile: path to the .baitmap file bamFile: path to the capture HiC bamfiles
metadata (dict) – input metadata
output_files (dict) – outPrefixDesign : Path and name of the output prefix, recommend to be the same as rmap and baitmap files. sample_name: Path and name of the .chinput file

Returns:

bool
output_metadata

Convert BAM file into chicago input files .chinput¶

Data normalization and peak calling¶

This pipeline runs the normalization of the data and call the real chomatine interactions

Running from the command line¶

Parameters¶

config : str: Configuration JSON file
in_metadata : str: Location of input JSON metadata for files
out_metadata : str: Location of output JSON metadata for files

Returns¶

output_dir: directory with all output folders and files

Example¶

REQUIREMENT - Needs a reference genome

Needs file with the capture sequences with FASTA format
- settings file
- design dir:
  
  .rmap .baitmap .npb .nbpb .poe

When running the pipeline on a local machine without COMPSs:

python process_run_chicago.py \
   --config tests/json/config_chicago.json \
   --in_metadata tests/json/input_chicago.json \
   --out_metadata tests/json/output_chicago.json \
   --local

When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/ software-and-apps/software-list/comp-superscalar/):

runcompss                     \
   --lang=python              \
   --library_path=${HOME}/bin \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug          \
   process_runChicago.py         \
      --config tests/json/config_chicago.json \
      --in_metadata tests/json/input_chicago.json \
      --out_metadata tests/json/output_chicago.json

Methods¶

class process_run_chicago.process_run_chicago(configuration=None)[source]¶

Function for processing capture Hi-C fastq files. Files are aligned, filtered and analysed for Cpature Hi-C peaks

run(input_files, metadata, output_files)[source]¶

This main function that run the chicago pipeline with runChicago.R wrapper

Parameters:

input_files (dict) – location with the .chinput files. chinput_file: str in case there is one input file chinput_file: comma separated list in case there is more than one input file.
metadata (dict) – Input metadata, str
output (dict) – output file locations

Returns:

output_files (dict) – Folder location with the output files
output_metadata (dict) – Output metadata for the associated files in output_files

Pipelines¶

Map and parse CHi-C reads¶

Running from the command line¶

Parameters¶

Returns¶

Example¶

Methods¶

Create CHiCAGO input RMAP¶

Create CHiCAGO input BAITMAP¶

Create CHiCAGO input Design files¶

Running from the command line¶

Parameters¶

Returns¶

Example¶

Methods¶

Convert BAM file into chicago input files .chinput¶

Data normalization and peak calling¶

Running from the command line¶

Parameters¶

Returns¶

Example¶

Methods¶

Run the entire CHi-C pipeline¶