abc4pwm

helsesorost ous

ABC4PWM

abc4pwm is a software tool for clustering of pwms, classficiation of pwms to their DNA binding Domain, motif search, and other supportive modules.

Authors:

Omer Ali1, Amna Farooq1, Mingyi Yang3,4, Magnar Bjørås4,7, Victor Jin5, Junbai Wang1*,2,6

  1. Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
  2. Department of Clinical Molecular Biology in University of Oslo, Norway
  3. Department of Medical Biochemistry, Oslo University Hospital and University of Oslo, Oslo, Norway
  4. Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway.
  5. Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
  6. Department of clinical molecular biology (EpiGen), Akershus University Hospital, Lørenskog, Norway
  7. Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway

To whom correspondence should be addressed. Email: junbai.wang@medisin.uio.no

Abstract

Background: Applications of high throughput sequencing technology in protein-DNA interactions generate transcription factor (TF) binding motifs with an ever-increasing number of collections, which are maintained in different databases generated from numerous sources. There is lack efficient tools to cluster biologically relevant or similar motifs such as position weight matrices (PWMs), from either experimental detection or in silico predictions. Moreover, an automatic clustering quality assessment method is needed for the quality evaluation of clusters of PWMs.

Results: This work presents a new package Affinity Based Clustering for Position Weight Matrices (abc4pwm), either with or without DNA-Binding Domain (DBD) information. Abc4pwm is able to generate a representative motif for each cluster, to evaluate the clustering quality of PWMs automatically, and filter out wrongly clustered PWMs. Additionally, it can update human DBD family database automatically, classify known human TF PWMs to the respective DBD family, and perform TF motif searching and motif discovery by a new ensemble learning approach.

Conclusion: Applications of abc4pwm in the DNA sequence analysis for several high throughput sequencing data (e.g., RNA-seq and ChIP-seq data) are demonstrated by using ~1770 human TF PWMs. It not only recovers known TF motifs at gene promoters based on gene expression profiles, but also identifies true TF binding targets according to ChIP-seq experiments. Both the clustering of PWMs and the automatic quality assessment for the clusters significantly reduce the computational time in data analysis, and enhance the biological meaningful interpretations. Abc4pwm is a useful tool in DNA sequence analysis.

How to start:

abc4pwm is written in python. It can be installed and accessed from command line and is available for both linux and mac operating systems. The package can be downloaded here.

Prior to installing the package, dependencies must be fulfilled. List of dependencies is as follows:

It is advised to install dependencies using miniconda. Package contains a file requirements.txt which can be used for automatic installation of dependencies from conda or pip. To install the package, go to the AffinityPropogation_Clustering directory and type: python setup.py install. For more details, follow the readme file in the package.

Contents of the package:

The package folder will contain the following:

Pipeline Tasks:

The pipeline consists of following tasks. To run a task, type abc4pwm <task> [<args>]. To see what are the options for each task of the pipeline, please run: abc4pwm -h

Demo

Test run is available on human pwms data, present in demo folder. In folder abc4pwm/demo , there demos of all modules and study cases which can be run by entering: ./demo , in the command line to run the demo automatically. In folder abc4pwm/demo , there demos of all modules and study cases.

Having trouble with package? Contact us @ omerali.0191@gmail.com, junbai.wang@medisin.uio.no and we will be glad to help you.