Vishaka Datta Screenivasa Gopalan: ChIP-seq simulations reveal key sources of variations and suggest experimental design
- Date: Apr 27, 2018
- Time: 02:00 PM - 03:00 PM (Local Time Germany)
- Speaker: Vishaka Datta Screenivasa Gopalan from the National Centre of Biological Sciences NCBS, Bangalore, India
- Location: MPI Plön
- Room: Practical Room
- Host: Paul Rainey
Abstract:
ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a
high-throughput technique that yields a set of genomic regions that are
bound by a transcription factor (TF). ChIP-seq data has been shown to
depend on several biological factors, such as the presence of
nucleosomes, indirect binding and cooperative binding. At the same time,
experimental factors, such as the use of antibodies, cross-linking and
PCR in the protocol, have also been shown to affect ChIP-seq data. The
impact of these two sets of factors on inferences made from ChIP-seq
data is unclear. We address this question by simulating a ChIP-seq
experiment where we model both the binding of a TF across the genome and
the experimental processes of fragment extraction, PCR amplification
and sequencing. We find that the TF motif (position weight matrix, or
PWM) can be easily recovered even when extraction (antibody and
cross-linking) and PCR amplification efficiencies vary across the
genome. The information content of the recovered motif reduces as the
fraction of sites cooperatively or indirectly bound sites increased. We
also find that ChIP-seq read counts can effectively distinguish between
two binding sites of different affinities only when they are relatively
high in comparison to the rest of the genome. Finally, the number of
ChIP-seq replicates needed to accurately measure in vivo occupancy at
these low affinity sites is larger than what community standards
recommend. Our results suggest some recommendations for the ChIP-seq
protocol to improve its sensitivity, and also establish statistical
limits on the accuracy of inferences of protein-DNA binding from
ChIP-seq.