Pipelines

Map and parse CHi-C reads

This pipeline will take as input two fastq files, RE sites, the genome indexed with GEM and the same genome in FASTA file. This pipeline uses TADbit to map, filter and produce a bed file that will be used later on to produce bam file compatible with CHiCAGO algorithm. More information about filtering and mapping https://3dgenomes.github.io/TADbit/

Running from the command line

Parameters

config : str
Configuration JSON file
in_metadata : str
Location of input JSON metadata for files
out_metadata : str
Location of output JSON metadata for files

Returns

Wd : folders and files
path to the working directory where the output files are

Example

REQUIREMENT - Needs two fastq files single end, FASTA genome and bowtie2 indexed genome.

When running the pipeline on a local machine without COMPSs:

1
2
3
4
5
python process_hicup.py \
   --config tests/json/config_hicup.json \
   --in_metadata tests/json/input_hicup.json \
   --out_metadata tests/json/output_hicup.json \
   --local

When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):

1
2
3
4
5
6
7
8
9
runcompss                     \
   --lang=python              \
   --library_path=${HOME}/bin \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug          \
   process_fastq2bed.py         \
      --config tests/json/config_hicup.json \
      --in_metadata tests/json/input_hicup.json \
      --out_metadata tests/json/output_hicup.json

Methods

class process_hicup.process_hicup(configuration=None)[source]

This class run hicup tool which run hicup, doing the alignment and filtering of the reads and convert them into a BAM file.

run(input_files, metadata, output_files)[source]

This is the main function that runs

Parameters:
  • input_files (dict) – fastq1 fastq2
  • metadata (dict) –
  • output_files (dict) –
    out_dir: str
    directory to write the output
Returns:

  • results (bool)
  • output_metadata (dict)

Create CHiCAGO input RMAP

Create CHiCAGO input BAITMAP

Create CHiCAGO input Design files

This script use as input .rmap and .baitmap files and generate the Design files. NPerBin file (.npb): <baitID> <Total no. valid restriction fragments in distance bin 1> … <Total no. valid restriction fragments in distance bin N>, where the bins map within the “proximal” distance range from each bait (0; maxLBrownEst] and bin size is defined by the binsize parameter. NBaitsPerBin file (.nbpb): <otherEndID> <Total no. valid baits in distance bin 1> … <Total no. valid baits in distance bin N>, where the bins map within the “proximal” distance range from each other end (0; maxLBrownEst] and bin size is defined by the binsize parameter. Proximal Other End (ProxOE) file (.poe): <baitID> <otherEndID> <absolute distance> for all combinations of baits and other ends that map within the “proximal” distance range from each other (0; maxLBrownEst]. Data in each file is preceded by a comment line listing the input parameters used to generate them.

Running from the command line

Parameters

config : str
Configuration JSON file
in_metadata : str
Location of input JSON metadata for files
out_metadata : str
Location of output JSON metadata for files

Returns

“nbpb” : .nbpb file “npb” : .npb file “poe” : .poe file

Example

REQUIREMENT - Needs RMAP and BAITMAP files

When running the pipeline on a local machine without COMPSs:

1
2
3
4
5
python process_design.py \
   --config tests/json/config_design.json \
   --in_metadata tests/json/input_design.json \
   --out_metadata tests/json/output_design.json \
   --local

When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):

1
2
3
4
5
6
7
8
9
runcompss                     \
   --lang=python              \
   --library_path=${HOME}/bin \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug          \
   process_design.py         \
      --config tests/json/config_design.json \
      --in_metadata tests/json/input_design.json \
      --out_metadata tests/json/output_design.json

Methods

class process_design.process_design(configuration=None)[source]

This class generates the Design files and chinput files, imput for CHiCAGO. Starting from rmap and baitmap and capture HiC BAM files.

run(input_files, metadata, output_files)[source]

Main function to run the tools, MakeDesignFiles_Tool.py and bam2chicago_Tool.py

Parameters:
  • input_files (dict) – designDir: path to the folder with .rmap and .baitmap files rmapFile: path to the .rmap file baitmapFile: path to the .baitmap file bamFile: path to the capture HiC bamfiles
  • metadata (dict) – input metadata
  • output_files (dict) – outPrefixDesign : Path and name of the output prefix, recommend to be the same as rmap and baitmap files. sample_name: Path and name of the .chinput file
Returns:

  • bool
  • output_metadata

Convert BAM file into chicago input files .chinput

Data normalization and peak calling

This pipeline runs the normalization of the data and call the real chomatine interactions

Running from the command line

Parameters

config : str
Configuration JSON file
in_metadata : str
Location of input JSON metadata for files
out_metadata : str
Location of output JSON metadata for files

Returns

output_dir: directory with all output folders and files

Example

REQUIREMENT - Needs a reference genome
  • Needs file with the capture sequences with FASTA format
    • settings file
    • design dir:
      .rmap .baitmap .npb .nbpb .poe

When running the pipeline on a local machine without COMPSs:

1
2
3
4
5
python process_run_chicago.py \
   --config tests/json/config_chicago.json \
   --in_metadata tests/json/input_chicago.json \
   --out_metadata tests/json/output_chicago.json \
   --local

When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/ software-and-apps/software-list/comp-superscalar/):

1
2
3
4
5
6
7
8
9
runcompss                     \
   --lang=python              \
   --library_path=${HOME}/bin \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug          \
   process_runChicago.py         \
      --config tests/json/config_chicago.json \
      --in_metadata tests/json/input_chicago.json \
      --out_metadata tests/json/output_chicago.json

Methods

class process_run_chicago.process_run_chicago(configuration=None)[source]

Function for processing capture Hi-C fastq files. Files are aligned, filtered and analysed for Cpature Hi-C peaks

run(input_files, metadata, output_files)[source]

This main function that run the chicago pipeline with runChicago.R wrapper

Parameters:
  • input_files (dict) – location with the .chinput files. chinput_file: str in case there is one input file chinput_file: comma separated list in case there is more than one input file.
  • metadata (dict) – Input metadata, str
  • output (dict) – output file locations
Returns:

  • output_files (dict) – Folder location with the output files
  • output_metadata (dict) – Output metadata for the associated files in output_files

Run the entire CHi-C pipeline