Substitutional landscape of a split fluorescent protein fragment using high-density peptide microarrays

Split fluorescent proteins have wide applicability as biosensors for protein-protein interactions, genetically encoded tags for protein detection and localization, as well as fusion partners in super-resolution microscopy. We have established and validated a novel platform for functional analysis of leave-one-out split fluorescent proteins (LOO-FPs) in high throughput and with rapid turnover. We have screened more than 12,000 strand 10 variants using high-density peptide microarrays for binding and functional complementation in Green Fluorescent Protein. We studied the effect of peptide length and the effect of different linkers to the solid support and mapped the effect of all possible amino acid substitutions on each position as well as in the context of some single and double amino acid substitutions. As all peptides were tested in 12 duplicates, the analysis rests on a firm statistical basis allowing determination of robustness and precision of the method. We showed that the microarray fluorescence correlated with the affinity in solution between the LOO-FP and peptides. A double substitution yielded a peptide with 9-fold higher affinity than the starting peptide.


INTRODUCTION
Tens of new split fluorescent proteins (splitFPs) have been developed since the first reassembly of a splitFP was achieved 20 years ago (1). Fluorescent proteins (FPs) have been split in many creative ways, by removing fragments ranging from around half of the FP β-barrel (1) to one (2) or two (3) secondary elements. The splitFP fragments obtained have low or no fluorescence on their own, but can reassemble to form a fully functional FP. These properties have made splitFPs desirable for many bioanalytical applications, from sensing protein-protein interactions to protein detection and localization, and as tools in super-resolution microscopy (4,5).
Leave-one-out splitFPs (LOO-FPs) are variants of splitFPs in which one of the secondary elements, such as one of the β-strands or the internal α-helix, typically of less than 20 amino acids, are removed (6). Ideally, left-out elements can spontaneously associate with the LOO-FP to recover fluorescence, making them useful as tags fused to a protein of interest, as well as individual peptides for in vitro applications. Preferably, a peptide should have high solubility, affinity and brightness upon complementation of the LOO fragment, as well as a small size that interferes minimally with a potential fusion partner protein (2,5,7,8). Amino-acid substitutions in the sequence of the LOO-FP fragments offer the possibility to modulate their complementation efficiency, spectral properties, solubility and photostability. Although these are clear targets for optimizing LOO-FPs, genetic engineering methods such as random mutagenesis are laborious and do not directly measure binding between the splitFP fragments (9).
High-density peptide microarrays provide a powerful technology for massively parallel screening of peptide-protein interactions. While DNA and RNA arrays have been extensively used in mappings of polynucleotide-protein interactions (10), peptide microarrays have mostly been reserved for antibody epitope mapping and screening receptor-ligand interactions (11). Peptide microarrays cannot, however, typically accommodate peptides longer than 15-20 residues and, in addition, the affinity of the peptide-protein interaction to be investigated needs to be less than µM (11). Thus, peptide microarray analysis is attractive for complementing LOO-FPs, as peptides are generally under 20 residues (6), and can bind the LOO partner with dissociation constants from hundreds of picomolar to hundreds of nanomolar (4).
LOO-FPs systems can be divided in those where the chromophore is matured prior to reconstitution of the full-length protein and those were maturation takes place on reconstitution.
While fluorescence recovery in the former is rapid (in the order of minutes) and essentially 4 dependent on the rate of association of the partners, the latter may take hours requiring chemical condensation and oxidation of the chromophore. We hypothesized that, since binding of the split fragments with a preformed chromophore generates a fluorescent signal, their association could directly be followed by fluorescence detection on a peptide microarray. We tested this hypothesis using the superfolder "GFP split10" system, in which strand 10 is removed from the N-terminus of a circular permutated variant by trypsin digestion generating LOO10-GFP (12). The peptide microarray that we tested had a library of 12,544 left-out strand 10 (s10) peptide sequences in 12fold repeats and was screened for complementation to the partner truncated protein using a fluorescence laser scanner. Contrary to DNA-based screening methods, chemical synthesis of defined peptides allowed for highly targeted analysis in which specific sequences can be queried in a non-random fashion with direct, rapid and quantitative readout.
We generated comprehensive splitFP sequence-function maps in a single experiment and with high precision, without the need of mutant selection rounds, enrichments or individual handling of clones. Analyzing peptides scanned with all possible amino acid substitutions in s10, we mapped hotspot residues and discovered improving substitutions. SplitFP complementation using peptide arrays also provided information about the sequences with lower fluorescence yield, which are generally inaccessible by genetic methods (2,7). By introducing variations in the s10 context such as the peptide length and charge of the C-terminal array surface linker, we demonstrated the robustness of the assay. Finally, we assessed the accuracy of the microarray platform by characterizing interesting s10 sequences spectrally and thermodynamically in solution.

Experimental setup
LOO10-GFP was obtained from a circularly permuted superfolder GFP (cp-sfGFP), with β-strand 10 engineered in the N-terminal, employing minor modifications to an established protocol (12). S10 was removed by trypsin digestion followed by size exclusion chromatography in denaturing conditions, to yield LOO10-GFP ( Figure 1A). Upon refolding in native buffer, LOO10-GFP was obtained in a fairly stable and soluble state which recovered fluorescence when reassembled with synthetic s10 variants in solution (data not shown).
The s10 peptide library was designed as a single microarray layout with 12 identical sectors, amounting to 150,528 peptide spots in total on a microscope-format slide (shown in the false-color image Figure 1B). The s10 library is developed around the 18-mer "wild-type" s10 peptide L195PDNHYLSTQTVLSKDPN212 (termed s10long). Preliminary experiments had shown that a truncation to a core sequence of 11 residues, N198HYLSTQTVLS208 (s10short), forming a minimal b-strand, was also able to complement in solution (data not shown). In addition, preliminary microarray screens showed that 7-mer charged linkers for the short format would be beneficial (data not shown). Thus, 7-mer linkers with positive, neutral and negative charges were screened for s10short in order to increase signal to noise, but also to assess signal robustness. We also wished to study how 1 amino acid shifts to the left and right of the sequence of s10short influenced fluorescence recovery, resulting in the s10shortL D197-L207 and the s10shortR H199-K209 variants, respectively. An overview of the length, linker and substitution variants tested for the s10 peptide can be found in Figure 1C and Table S1. To generate a comprehensive picture of the sequence tolerance, all single amino acid substitutions were synthesized for the s10long (342 variants), s10short linked to GS (gs2) for s10long and GSGSGSG (gs7), GKGSKSG (gk7) and GEGSESG (ge7) for s10short. As controls, substitutional scans of a negative control (neg) 11-mer peptide inspired from 6 a split-luciferase (13) were synthesized to test the specificity of LOO10-GFP to s10 variants. We furthermore added linker and FLAG synthesis controls, and blank spots for background estimation. Arrays were synthesized on modified surface microscope slides using a lithographic method described previously (14) and were purchased from a commercial vendor (Schafer-N, Copenhagen). Identical peptides were placed at the same position relative to other peptides within individual sectors, but peptides were deliberately not grouped by linker or composition within a sector, but rather placed randomly to avoid local artifacts affecting similar peptides in a similar fashion.
To determine binding and recovery of activity, we incubated the peptide microarray with LOO10-GFP, and after a 10 min wash step the microarray was briefly dried and imaged at 488 nm excitation and 520 nm emission using a microarray scanner (see Methods for details; Figure 1D).
Image analysis of the fluorescent spots allowed us to assign each peptide in the library a fluorescent signal.

Microarray assay performance
Before determining any substitutional effects, we quantitatively assessed the performance of the microarray assay in terms of precision, specificity and robustness. Data handling and statistical analyses are detailed in Methods. The high precision of the microarray method was proven by correlating the variant libraries in all 12 sector replicas, which showed Pearson coefficients > 0.94 ( Figure S1).
Binding to positively charged peptides by low pI proteins is a common concern with peptide microarrays (15,16). We tested whether the low-fluorescence LOO10-GFP might have nonspecific bias towards highly charged peptide spots, leading to false-positive signals. Results showed that microarray fluorescence was independent of formal peptide charges and distributes around s10long WT, which has a formal net charge of -1 (Figure 2A). In addition, the fluorescent complementation was specific towards s10 variants, signal from all negative control peptides being minimal ( Figure 2B).
Comparing long and short length variants we found that peptide length and charged C-terminal linker variants can have an influence on the dynamic range of fluorescence. In Figure 2C we compared the same substitution variants in the long versus the short peptide format, with or without a neutral linker. The long and short variants correlated well with each other (Pearson coefficients 8 > 0.98), but the signal from s10long peptides was substantially higher than from s10short format in terms of the dynamic range.  The various short formats are all well correlated for individual substitutions among themselves ( Figure S2). For the s10short format a linker is clearly preferable, presumably to increase the distance to the microarray surface. The shifted variants had low signal-to-noise, but seem to follow a nearly one-to-one relationship with short variants (Figure S3). This suggests that the 9-mer central sequence HYLSTQTVL could potentially work per se as a split 10 fragment.
The main conclusion from this comparison is that although there are differences in the performance of individual linker designs, the variant sequences correlated very well with each other between linker formats. Overall, the microarray signals are highly robust with regards to ranking across substitution variants. However, due to the different dynamic ranges obtained at different length and linker formats, substitutional effects should always be interpreted within the same peptide format.

Effect of substitutions on s10long array signal
Since all peptide formats gave sequence-dependent correlated signals, we will in the following discuss substitutional effects in the s10long format without linker, since this format yielded the highest signals and dynamic range. The heatmap in Figure 3A shows the effect of all s10long single residue substitutions on the fluorescence of the s10long:LOO10-GFP complex. Some increase in fluorescence when substituting T203 with hydrophobic residues Y, F, I and V was expected, because these substitutions are known to cause a red shift in the fluorescence emission maximum from 506 nm to, depending on the context, 515-527 nm (12,(17)(18)(19). The readout using a 520 nm emission filter could favor the T203 red-shifted substitutions. Besides the effects at position 203, replacing H199 with a series of hydrophobic residues (Y, F, I, V, L) also resulted in increased signal. Specifically, introducing H199Y as a single substitution offered 10-fold greater fluorescence compared to WT.
To evaluate the importance of each of the WT residues, we assigned each s10 position a substitutional tolerance value from 1 (lowest) to 19 (highest) based on the number of substitutions it could accept at each position without gaining or losing function, as described in Methods.
Mapping this onto the superfolder GFP crystal structure ( Figure 3B sheet-disruptive P and G were, as expected, detrimental (20). We noted that substitution to H had a negative impact on almost all positions.  (Figure S4), suggesting that these are the three most constrained s10 positions.

How well does array signal reflect binding affinity in solution?
The effect of the most beneficial substitutions was hypothesized to reflect an affinity increase at sub-saturation concentrations of LOO10-GFP ( Figure S6). On the other hand, intrinsic brightness of FP complexes at the wavelength of measurement would also affect apparent intensity. To address this issue, we chose to analyze the binding affinity and brightness, in solution, of the reconstituted GFP with the s10 WT peptide and three gain-of-function variants (H199Y, T203Y and H199Y/T203Y) in the short format and one loss-of-function variant in the long format (L207R). We examined whether their microarray signals correlated with brightness and/or affinity of the peptide variants upon reassembly with LOO10-GFP in solution.
Upon complementation of LOO10-GFP at saturating concentrations of each s10 variant, we plotted individual relative emission spectra (Figure 4A) relative to WT. We observed no fluorescent complementation when LOO10-GFP was incubated with the L207R variant; the spectrum of this complex was unchanged relative to LOO10-GFP. S10 WT complemented LOO10-GFP with 5-fold increase in 520 nm signal, but no additional effect on brightness when introducing H199Y as a single substitution. T203Y variant caused a spectral shift of the emission maximum of the recovered FP complex from 506 to 520 nm, offering ~ 1.6-fold higher brightness of the T203Y variant compared to WT at 520 nm. H199Y in combination with T203Y increased the 520 emission by 1.8-fold compared to WT. The loss-of-function effect of L207R and the gainof-function effects of T203Y variants were both captured by the microarray fluorescence of these variants. Still, brightness effects in solution could not explain the microarray data; a R 2 < 0.5 being obtained when correlating the two datasets ( Figure 4B). In particular, H199Y, with a 10-fold increase in microarray fluorescence as a single substitution, showed no brightness effects when saturating LOO10-GFP with this variant in solution.
Next, we titrated LOO10-GFP with increasing concentrations of s10 variant in solution and fitted the binding curves (Figure 4C-F) to calculate dissociation constants, Kd, as a measure of affinity between the fragments. The H199Y substitution showed a ~ 5-fold increase in affinity compared to WT, while the double substituted H199Y/T203Y had a ~ 9-fold increase. The ~ 2-fold increase in affinity of T203Y relative to WT previously reported for a 19-mer s10 (12) is replicated by our data. Competition experiments suggested that L207R variant binds LOO10-GFP very weakly and is almost fully displaced by a 4-fold lower concentration of WT variant (Figure S7). We therefore approximated the affinity of L207R to be at least 10-fold weaker than WT.
Assuming a sub-saturation regime where Kd is greater than the LOO10-GFP concentration, the association constants, Kd -1 , should scale linearly with array fluorescence, see Methods. Under this assumption, affinity effects are more likely to explain the array fluorescence, although they do not take into account the spectral contributions ( Figure 4G). Furthermore, taking the spectral shift for T203Y and H199Y/T203Y into account, the spectrally corrected microarray fluorescence offered a slightly better fit, although probably not significant (Figure 4H). Based on these experiments, 13 we conclude that with affinities in the nM range, the signal intensity on peptide microarrays in this format faithfully reflects the binding affinity between the split fragments.

(C-F) Binding isotherms of (C) WT, (D) H199Y (390ex/506em) and (E) T203Y, (F) H199Y/T203Y (495ex/520em) variants. Error bars represent standard deviation of 3 independent measurements. (G) Correlation between relative microarray fluorescence and relative affinity of the tested split fragments in solution agrees well with a linear model. (H) Combining the spectral and affinity
properties in solution still explains the microarray fluorescence.

DISCUSSION
We have developed a precise, robust and accurate method for exploring the substitutional landscape for leave-one-out split fluorescent proteins, that has generated a comprehensive sequence-function map of a splitFP tag. Chemical synthesis of peptides in the library avoided timeconsuming and bias-inducing steps like cloning, expression, purification or sequencing, that are usually required in genetic screens. By having full control over the s10 sequence on the microarray, we circumvented the limitations of the DNA codon table. Because double and triple substitutions in a given codon are rare, random mutagenesis will typically only generate a subset of mutations (21) and will never exhaustively sample double or triple amino acid substitutions. In the microarray setup it is possible to investigate every peptide in the library independently of Hamming distance from the starting sequence at the DNA level. Thus, libraries can be set up highly diverse (potentially including non-natural amino acid residues) offering a one-to-one picture of the entire functional/binding landscape, including low and medium performing sequences. In addition, because the LOO-FP chromophore is matured prior to our assay, we accessed high affinity s10 variants that might not be discoverable by multiplexed expression of the full-length proteins, where the chromophore cyclisation and oxidation is a prerequisite for fluorescence. Indeed, the H199Y substitution either does not show up or is identified as likely destabilizing in other GFP genetic screens (21,22). It is an interesting possibility that this variant, while stabilizing the mature GFP, is disfavored in genetic screens because it does not stabilize the non-fluorescent immature precursor. Lastly, our assay proposed a direct measure of peptide binding in high-throughput, avoiding false positives caused by oligomeric fluorescent species that can appear in genetic selection experiments performed in cells (9).
We should point out that the interpretation of results in the microarray platform assumes a similar chemical yield of peptide across variants. The level of reproducibility between 12 replicas suggested intra-sequence synthesis yields are very similar. Inter-sequence yields were more difficult to assess, but the consistently lower signals when incorporating histidine in any part of s10 might suggest a reduced coupling yield when histidine is incorporated in the sequence.
However, low histidine coupling yield was not observed in another study using microarrays from the same manufacturer (14).
In this analysis, we identified several interesting GFP strand 10 peptide substitutions and truncations. In particular, we note a double substitution, H199Y/T203F, which presented 54-fold higher microarray fluorescence compared to the WT sequence. We also found that the H199Y/T203Y s10 variant had almost 10-fold higher solution affinity compared to WT.
Truncations of s10 to 11-mer or even 9-mer proved active on the peptide microarray and the 11mer also effectively reconstituted fluorescence as a free peptide in solution. These short versions of s10 with the H199Y substitution could be readily used for in vitro applications, and could possibly be further investigated as non-interfering protein tags due to their small size.
For developing this screening platform, we used the split 10 FP system as a model, mainly due to its high affinity in vitro. Still, the LOO10-GFP efficiency in cellulo proved poor in previous studies (6,23). For engineering improved in cellulo and in vivo tags using the microarray platform reported here, one could turn to the strand 7 and strand 11 LOO systems (23). By incubating a microarray library with an immature LOO-FP, one could study what the sequence requirements would be for both binding and chromophore maturation. Indeed, β-strand-assisted chromophore maturation of LOO11-GFP on solid support has already been demonstrated (24).
Reaching saturation for every peptide could theoretically offer the possibility of detecting variants with intrinsic FP brightness improvement, desirable for many applications. A limitation in our study was that LOO10-GFP is a rather unstable protein with fairly low solubility, 2 µM being the maximum reliable concentration we could obtain for our microarray experiments. A more stable LOO10-GFP might be desirable, since titrating the microarray with increasing concentrations of LOO10-GFP could saturate all peptides and possibly allow plotting full binding curves for each peptide. This possibility was demonstrated in previous microarray screens (25,26). In the absence of strand 10, on the other hand, a hydrophobic patch is exposed, thus mutations that stabilize LOO10-GFP, e.g. by making this surface more hydrophilic, would also likely decrease affinity to strand 10.
We believe the platform reported here can be generalized to many split fluorescent and luminescent proteins. Screening on microarrays may be limited by unspecific binding which is avoided for split systems that require specific complementation in order to function. Different color FPs could be studied, since microarray scanners employ excitation laser sources emitting in the blue, green and red spectrum regions and provide multiple emission filter options. Tuning protein concentrations to the sub-saturation regime is typically not difficult, however, requires a bright chromophore and a sensitive microarray reader. For reasons of sensitivity, assaying split proteins using this technology is most likely limited to fluorescence detection. Thus, studying split enzymes systems would likely require appropriate (insoluble) fluorescent product formation. One possibility is using internally quenched substrates, similar to those used in some protease assays (27).
Overall, full control over the desired substitutions on peptide microarrays makes them versatile alternatives to cell-based approaches for rational design of binders, massively parallel testing of computational design and benchmarking biophysical prediction methods.

Preparation of LOO10-GFP
We used an adapted method from (12). Engineered plasmids of full-length circularly permuted superfolder GFP in pET-15b vectors were kindly provided by Steven Boxer, Stanford University.
The full-length GFP was expressed in BL21 (DE3) cells in commonly used AB-LB growth media, where ampicillin (VWR) was added to a concentration of 100 µg/ml and glucose to a concentration of 1% (w/v). The starter culture was grown overnight at 37°C in LB medium, then expanded in AB-LB medium in a ratio of 1:100 starter culture to growth medium. This culture was grown at  where F0 was background fluorescence (FU), Fmax was fluorescence at saturation (FU), G was LOO10-GFP concentration (nM), P is s10 peptide concentration (nM) and Kd is the binding affinity between the two fragments (nM). All fits were performed in OriginPro 2017.

Absorption spectroscopy
Concentration of LOO10-GFP was estimated by measuring absorbance at 447 nm and using its extinction coefficient at 0.1 M NaOH of Ɛ447 = 44,100 M -1 cm -1 (12). The s10 peptides were dissolved in assay buffer and the predicted Ɛ280 based on tryptophan and tyrosine content was used for concentration determination (28). All absorbance measurements were done on a Perkin Elmer Lambda 35 Spectrophotometer.

Microarray incubation and data analysis
The peptide library was synthesized on a single chip with 12 identical sectors, each containing Any outliers caused by dust or other contaminations were estimated by observing the distribution of the 12 replicas for each peptide. We chose a conservative cutoff for outlier removal, since contaminants should be very bright compared to a regular high signal. Thus, all the values within 6 fold Median Absolute Deviation (MAD) from the median for each peptide were taken into further analysis, while the rest were eliminated as outliers. MAD is generally not influenced by outliers, so the observation that the standard deviation after outlier removal is similar to MAD before outlier removal ( Figure S8) demonstrates that most outliers were successfully eliminated.
As each of the 12 sectors were, in principle, identical, the average sector signal should be the same.
However, the fact that average brightness shifted in a consistent gradient across the slide suggested an artifactual global variation ( Figure S9). This could be due to inhomogeneity of the functional surface or inhomogeneity of buffer component allocation during incubation/washing steps..

Thermodynamic description of microarray binding and spectral correction
The equilibrium between LOO10-GFP, P, immobile s10 peptide, S, and the fluorescent complex, PS, gives the fraction of fluorescent chromophores per microarray spot: We assume this to be proportional to the observed fluorescence, For spectral correction, the microarray fluorescence of T203Y and H199Y/T203Y is simply divided by the spectral shift factor of 1.6 determined from the Figure 4A.

Data and code availability
The raw data from the microarray screen and the R code used to clean, sort and plot the data are available at https://github.com/onea7/Substitutional-landscape-splitFP.