Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities.

This website supports Berger, Philippakis, et al.

Nature Biotechnology, Epub 2006 Sept 24



We have created a novel, maximally compact, synthetic DNA sequence design for protein binding microarrays (PBMs) that represents all possible DNA sequence variants of a given length k in an overlapping fashion on a single slide. We constructed such ‘all 10-mer’ microarrays by converting high-density single-stranded Agilent oligonucleotide arrays to double-stranded DNA arrays via on-array primer extension. Using these microarrays, we comprehensively determined the binding specificities over a full range of affinities of several TFs of diverse structural classes from yeast, worm, mouse, and human. We also developed a novel computational method to construct TF binding site motifs that takes full advantage of the unbiased sequence representation on these arrays, using all features (rather than only those above an arbitrary cutoff) to determine the optimal motif without any prior knowledge. As advances in microarray printing technology permit increased feature densities and feature lengths, our universal design will enable the complete coverage of even longer binding sites. This unbiased, comprehensive coverage of all k-mers permits interrogation of binding site preferences, including nucleotide interdependencies, at unprecedented resolution.

Below are the Supplementary data files that accompany this manuscript.

Supplementary Methods

Supplementary Table 1: Raw microarray data for all experiments performed in this study.

Supplementary Table 2: Enrichment scores calculated for all possible 8-mers for each transcription factor examined in this study.

Supplementary Table 3: Enrichment scores calculated for all possible 9-mers for each transcription factor examined in this study.

For the normalized signal intensities and DNA probe sequences of our two separate Agilent ‘all 10-mer’ universal microarrays, please click here.

Supplementary Fig. 1 (PDF 572K):
Survey of binding sites in the JASPAR database.

Supplementary Fig. 2 (PDF 516K):
Reproducibility of Cy3 dUTP signal intensities.

Supplementary Fig. 3 (PDF 680K):
Correlation of PBM signal intensities with affinities.

Supplementary Fig. 4 (PDF 632K):
Effects of binding site position and orientation on PBM signal.

Supplementary Fig. 5 (PDF 656K):
Comparison of median signal intensities for 28 Zif268 variants for fixed versus variable position, orientation, and flanking sequence.

Supplementary Fig. 6 (PDF 300K):
Correspondence between median signal intensities for 7-mers on distinct de Bruijn sequences.

Supplementary Fig. 7 (PDF 300K):
Biacore measurements supporting interdependence between the first two positions of the Cbf1 binding site.

Supplementary Fig. 8 (PDF 656K):
Minimum number of unique features on an array for different values of k.

Supplementary Fig. 9 (PDF 612K):
Dependence of Cy3 dUTP incorporation upon sequence context.

Questions? Mike Berger, Anthony Philippakis, Martha Bulyk.