Vishaka Datta Screenivasa Gopalan: ChIP-seq simulations reveal key sources of variations and suggest experimental design

Date: Apr 27, 2018
Time: 02:00 PM - 03:00 PM (Local Time Germany)
Speaker: Vishaka Datta Screenivasa Gopalan from the National Centre of Biological Sciences NCBS, Bangalore, India
Location: MPI Plön
Room: Practical Room
Host: Paul Rainey

Abstract:

ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique that yields a set of genomic regions that are bound by a transcription factor (TF). ChIP-seq data has been shown to depend on several biological factors, such as the presence of nucleosomes, indirect binding and cooperative binding. At the same time, experimental factors, such as the use of antibodies, cross-linking and PCR in the protocol, have also been shown to affect ChIP-seq data. The impact of these two sets of factors on inferences made from ChIP-seq data is unclear. We address this question by simulating a ChIP-seq experiment where we model both the binding of a TF across the genome and the experimental processes of fragment extraction, PCR amplification and sequencing. We find that the TF motif (position weight matrix, or PWM) can be easily recovered even when extraction (antibody and cross-linking) and PCR amplification efficiencies vary across the genome. The information content of the recovered motif reduces as the fraction of sites cooperatively or indirectly bound sites increased. We also find that ChIP-seq read counts can effectively distinguish between two binding sites of different affinities only when they are relatively high in comparison to the rest of the genome. Finally, the number of ChIP-seq replicates needed to accurately measure in vivo occupancy at these low affinity sites is larger than what community standards recommend. Our results suggest some recommendations for the ChIP-seq protocol to improve its sensitivity, and also establish statistical limits on the accuracy of inferences of protein-DNA binding from ChIP-seq.