Developability

Introduction

Antibody developability refers to the physicochemical and biochemical properties that influence an antibody's suitability for development, manufacturing, and clinical use. In addition to strong target affinity, a viable therapeutic antibody must exhibit favorable solubility, stability, manufacturability, viscosity, and low risk of aggregation and other potential liabilities.
The resulting output provides an overview of the predicted developability profiles of the screened antibodies, based on computational approaches that incorporate both sequence- and structure-based analyses. The evaluated metrics combine machine learning models trained on clinical-stage antibody data with physicochemical property assessments.

Submit a job

Bellow is a figure of the Developability analysis on the portal. The version may change during the development of the workflow, but the latest version is always recommended to use.

Clicking on the RUN JOB button, we can launch a Developability job.

Inputs

The input of the job is sequences of antibodies to assess the developability properties. These can be in FASTA or CSV format where the some small requirements (shown on the right-handed site of the application or as below) must be followed.

Select a single FASTA file containing antibody sequences for developability assessment. FASTA headers must end with '_VH' for heavy chains and '_VL' for light chains. Both chains from the same antibody must share the same base name (e.g., "Ab1_VH" and "Ab1_VL"). For nanobodies, use the nanobody name with a '_VH' suffix; for scFv sequences, use the scFv name without a chain suffix. The developability assessment supports antibodies with Fc domains, though not all predictions evaluate the Fc region.

Or select a single CSV file containing antibody sequences for developability assessment. Antibodies must have VH and VL sequences in separate columns, named "protein_name", "VH_sequence" and "VL_sequence", respectively. For single-chain antibodies (VHH or scFv), only two columns are required: "protein_name" ( for name) and "VH_sequence" (for sequence)

Example of a valid input fasta file:

>Atezolizumab_VH
EVQLVESGGGLVQPGGSLRLSCAASGFTFSDSWIHWVRQAPGKGLEWVAWISPYGGSTYYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARRHWPGGFDYWGQGTLVTVSS
>Atezolizumab_VL
DIQMTQSPSSLSASVGDRVTITCRASQDVSTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQYLYHPATFGQGTKVEIK
>Bapineuzumab_VH
EVQLLESGGGLVQPGGSLRLSCAASGFTFSNYGMSWVRQAPGKGLEWVASIRSGGGRTYYSDNVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCVRYDHYSGSSDYWGQGTLVTVSS
>Bapineuzumab_VL
DVVMTQSPLSLPVTPGEPASISCKSSQSLLDSDGKTYLNWLLQKPGQSPQRLIYLVSKLDSGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCWQGTHFPRTFGQGTKVEIK

Example of a valid input CSV file:

protein_name	VH_sequence	VL_sequence
Adalimumab	EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLEWVSAITWNSGHIDYADSVEGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAKVSYLSTASSLDYWGQGTLVTVSS	DIQMTQSPSSLSASVGDRVTITCRASQGIRNYLAWYQQKPGKAPKLLIYAASTLQSGVPSRFSGSGSGTDFTLTISSLQPEDVATYYCQRYNRAPYTFGQGTKVEIK
Natalizumab	QVQLVQSGAEVKKPGASVKVSCKASGFNIKDTYIHWVRQAPGQRLEWMGRIDPANGYTKYDPKFQGRVTITADTSASTAYMELSSLRSEDTAVYYCAREGYYGNYGVYAMDYWGQGTLVTVSS	DIQMTQSPSSLSASVGDRVTITCKTSQDINKYMAWYQQTPGKAPRLLIHYTSALQPGIPSRFSGSGSGRDYTFTISSLQPEDIATYYCLQYDNLWTFGQGTKVEIK

The pipeline can accept single domain scFv or VHH antibodies by selecting the appropriate option.

In the input field of either input_fasta_dataset or input_csv_dataset , the uploaded input file should be selected.

image-20250805-091308

By default, the workflow will generate HTML reports, including a summary and individual reports for the input antibody sequences. If there are more than 200 input antibodies, the workflow will automatically skip making individual reports to save time. Aslo, If there are too many input sequences (eg. over 1000 sequences), one should deselect the report generation option since making reports for larger number of antibodies takes quite much time.

Expected outputs

The expected outputs are as below:

developability_summary.html: The summary report containing overview information and main results of the developability assessments.
Reports: Consists of individual report of each screened antibody in the html format. The individual report provides a more detailed description of the developability properties, particularly the comparison of its properties with those of clinical-stage antibodies, as well as the hydrophobicity and electrostatic potential surface of the antibody structure.
Results_CSVs: The predicted deveopability metrics/parameters in the CSV format:
- main_results.csv: main predicted metrics which is also shown in the sumarry report (developability_summary.html)
- consolidated_metrics.csv: All predicted metrics/parameters in a single consolidated csv file
- dev_motifs.csv: Chemical developability (DEV) motif liabilities along with antibody sequences. These are also included in the consolidated_metrics.csv.
- ns_motifs.csv: Non-specificity (NS) motif liabilities along with with antibody sequences. These are also included in the consolidated_metrics.csv.
Models_HTML: Consists of the html visualization of each predicted antibody structure with some features to visualize electrostatic and hydrophobic potentials. This html visualization is also integrated to the individual report of each antibody.

Model_PDB: Consist of PDB models with PSH (Patches of Surface Hydrophobicity), PPC (Patches of Negative Charge) and PNC (Patches of Positive Charge) features mapped as B-factor. These features can be visualized by using integrated Mol* on the portal or Pymol (locally). The surface hydrophobicity patches can also be viewed directly from the SUMMAT TABLE (column: PDB model with hydrophobicity patches)

Visualize B-factors using Mol*

We can use a natural command like “show b-factor as uncertainty” in the Question box.

We can also use interface buttons:
- Type: Spacefill or Surface or Cartoon, …
- Color Theme: Uncertainty/Disorder

image-20260123-102043

Visualize B-factors using Pymol

Use simple command lines:

spectrum b, blue_white_red # or white_red
as surface # or as spheres

Comprehensive description

The main_results.csv contains the following deveopability metrics which help identify potential developability risks and guide candidate prioritization. Particular attention is given to two composite scores (META X and META Y), which capture key biophysical properties related to cross-/self-interactions and hydrophobicity, respectively. Lower META X and META Y scores are generally associated with more favorable developability profiles, and these composite metrics are useful for ranking the screened antibodies based on predicted assay-derived properties.

Additional metrics include the Aggregation Average Score, which estimates overall aggregation propensity; Scaled Solubility, which predicts protein solubility; and pI and Spatial Positive Charge Map scores, which are particularly informative for assessing non-specific binding risk; and Viscosity Classes, which classify antibodies by predicted viscosity at high concentrations.

The flagging profiles provide a comparative reference to evaluate how closely an antibody's developability features align with clinical-stage therapeutics. In addition, the assessment of common chemical liabilities and non-specificity motifs helps identify potential developability risks early in the candidate selection process.

META X (charge feature) : Represents the averaged rescaled ranks of ELISA (enzyme-linked immunosorbent assay), BVP (Baculovirus particle ELISA), PSR (poly-specificity reagent), CSI (clone self-interaction by biolayer interferometry), ACCSTAB (accelerated stability or AS), and CIC (cross-interaction chromatography) metrics. This composite score, which ranges from 0 to 1, reflects charge-associated properties; lower META X values suggest more favorable charge characteristics and reduced risk of cross- or self-interactions.
META Y (hydrophobicity feature) : Derived by averaging rescaled ranks from SMAC (stand-up monolayer adsorption chromatography) and HIC (hydrophobic interaction chromatography) metrics. This score, which ranges from 0 to 1, captures hydrophobicity-related properties; lower META Y values indicate lower hydrophobicity, which is generally favorable for solubility and low aggregation risk.
Aggregation average score : Represents the mean residue-level aggregation propensity across the antibody structure, providing a normalized estimate of the molecule’s overall aggregation tendency. More negative scores suggest reduced aggregation risk.
Scaled Solubility : A normalized solubility score (ranging from 0 to 1) indicating predicted solubility based on sequence features. Higher values indicate better solubility.
pI : Isoelectric point. It is the pH at which a molecule, typically a protein, carries no net electric charge. Relates to solubility, stability, and behavior in electrophoretic or chromatographic systems. pI is also associated with Heparin Affinity Chromatography (HAC), which is used to assess non-specific binding.
PSH_flag : Patches of Surface Hydrophobicity. Measures the size of solvent-exposed hydrophobic residues in CDR regions. This flag indicates values which are normal (green), extremal (orange) or out of the main distribution for clinical-stage antibodies (red). High levels of hydrophobicity, particularly in CDRs, have been implicated in aggregation, viscosity, and polyspeciﬁcity.
PNC_flag : Patches of Negative Charge. This flag indicates values which are normal (green), extremal (orange) or out of the main distribution for clinical-stage antibodies (red). Patches of Negative Charge in the CDRs can be linked to high rates of clearance and poor expression levels
PPC_flag : Patches of Positive Charge. Measures the size of solvent-exposed positively charged residues in CDR regions. This flag indicates values which are normal (green), extremal (orange) or out of the main distribution for clinical-stage antibodies (red). Patches of Positive Charge in the CDRs can be linked to high rates of clearance and poor expression levels
SFvCSP_flag : Structural Fv Charge Symmetry Parameter. Measures the charge symmetry of the Fv domain. This flag indicates values which are normal (green), extremal (orange) or out of the main distribution for clinical-stage antibodies (red). Asymmetry can be associated with poor biophysical behavior.
Viscosity classes : Classify the molecule as either low viscosity (0: <= 20 cP) or high viscosity (1: > 20 cP). High viscosity makes subcutaneous delivery difficult.
num of DEV liabilities : Number of detected motifs associated with developability (DEV) liabilities (including both solvent-exposed and non-exposed motifs).
num of fully exposed DEV liabilities : Number of detected motifs with fully solvent-exposed residues associated with DEV liabilities. These exposed motifs represent a higher potential risk, as they are more susceptible to solvent-driven chemical degradation.
num of NS liabilities : Number of detected motifs associated with non-specificity (NS) liabilities (including both solvent-exposed and non-exposed motifs).
Spatial positive charge map : Spatial positive charge map score in the Fv region (SCM_pos_Fv). Higher scores may indicate a greater risk of non-specific interactions, particularly with negatively charged molecules such as heparin. This metric correlates with Heparin Affinity Chromatography (HAC), which is commonly used to assess non-specific binding liabilities in antibodies.

Additional parameters (found in the consolidated_metrics.csv file )

"HIC.pred" : "Hydrophobic Interaction Chromatography. The relative hydrophobicity of biomolecules based on their interaction with hydrophobic ligands on a chromatography resin under high-salt conditions."
"SMAC.pred" : "Stand-up monolayer adsorption chromatography. "
"SGAC.pred" : "Site-Directed Gradient Affinity Chromatography coupled with Self-Interaction Nanoparticle Spectroscopy. Measures solubility, and aggregation risk.",
"CIC.pred" : "Cross-interaction chromatography. Measures propensity of a protein to interact with other proteins, indicating its cross-reactivity and non-specific binding tendency. ",
"CSIBLI.pred" : "Charge-Stabilized Self-Interaction Biolayer Interferometry. This technique provides insight into colloidal stability and aggregation risk.",
"ACSINS.pred" : "Affinity-Capture Self-Interaction Nanoparticle Spectroscopy. Measures valuate colloidal stability and developability risk",
"HEK.pred" : "Human Embryonic Kidney cell expression titer. Evaluate protein expression.",
"PSR.pred" : "Poly-Specificity Reagent. Assess the binding tendency of antibodies by detecting non-specific interactions with a mixture of membrane proteins.",
"ELISA.pred" : "Enzyme-Linked Immunosorbent Assay. Measures the presence and concentration of a specific antigen or antibody using enzyme-linked detection on a solid surface.",
"BVPELISA.pred" : "Baculovirus Particle ELISA. Assess the binding of antibodies to baculovirus particles displaying membrane proteins, to assess specificity and membrane context recognition.",
"DSF.pred" : "Differential Scanning Fluorimetry. Assess the thermal stability of proteins by detecting changes in fluorescence as the protein unfolds with increasing temperature.",
"ACCSTAB.pred" : "Accelerated Stability. Evaluate colloidal stability and aggregation risk after thermal or mechanical stress.",
"Agg_score_sum" : "A structure-based score summing individual residue-level aggregation propensities. It uses an empirically calibrated amino acid aggregation scale derived from in vivo data to estimate overall aggregation risk.",
"Agg_score_min" : "This metric is the minimum value amongst all Aggregation Scores.",
"Agg_score_max" : "This metric is the maximum value amongst all Aggregation Scores.",
"Agg_score_avg" : "This metric is the arithmetic mean of all Aggregation Scores.",
"Aggregation average score" : "Represents the mean residue-level aggregation propensity across the antibody structure. This provides a normalized estimate of the molecule’s overall tendency to aggregate.",
"percent-sol" : "Relative solubility compared to soluble E.coli proteins.",
"scaled-sol" : "A normalized solubility score indicating predicted solubility based on sequence features. Higher values indicate better solubility.",
"Solubility-Weighted Index" : "Predicted solubility of a protein based on its sequence-derived physicochemical properties. Higher values indicate better solubility.",
"Prob. of Solubility" : "Probability of solubility. Poor solubility can force proteins out of solution at the high concentrations, leading to precipitation and aggregation.",
"PSH" : "Patches of Surface Hydrophobicity. Measures the size of solvent-exposed hydrophobic residues in CDR regions. High levels of hydrophobicity, particularly in CDRs, have been implicated in aggregation, viscosity, and polyspeciﬁcity.",
"PNC" : "Patches of Negative Charge. Measures the size of solvent-exposed negatively charged residues. Patches of Negative Charge in the CDRs can be linked to high rates of clearance and poor expression levels",
"PPC" : "Patches of Positive Charge. Measures the size of solvent-exposed positively charged residues in CDR regions. Patches of Positive Charge in the CDRs can be linked to high rates of clearance and poor expression levels",
"SFvCSP" : "Structural Fv Charge Symmetry Parameter. Measures the charge symmetry of the Fv domain.",
"SCM_pos_Fv" : "Spatial positive charge map score in the Fv region. Higher scores may indicate a greater risk of non-specific interactions, particularly with negatively charged molecules such as heparin. This metric correlates with Heparin Affinity Chromatography (HAC), which is commonly used to assess non-specific binding liabilities in antibodies."
"SCM_neg_Fv" : "Spatial negative charge map score in the Fv region. Higher scores may indicate a greater risk of non-specific interactions, particularly with negatively charged molecules such as heparin. This metric negatively correlates with Heparin Affinity Chromatography (HAC), which is commonly used to assess non-specific binding liabilities in antibodies."
"SAP_pos_CDR": "Spatial aggregation propensity score of CDR regions. There is a moderate correlation between this score and the hydrophobic property (HIC and META Y)."
"SAP_pos_Fv": "Spatial aggregation propensity score of the Fv domain. There is a moderate correlation between this score and the hydrophobic property (HIC and META Y)."

Appendix

The appendix tables in the report summarize the developability and non-specificity motifs identified. Each row corresponds to a heavy (H) or light (L) chain of an antibody (ID). For each chain, the tables list potential liabilities, specific motifs in the sequence, and their start and end positions. The region column indicates where the motif is located (e.g., CDR1, CDR2), while present in germline shows whether the motif is naturally found in germline sequences. Surface exposure describes whether the motif is solvent-exposed, and present in therapeutics indicates if similar motifs exist in approved therapeutic antibodies. The motifs which naturally found in germline or exist in approved therapeutic can be considered as low risk. These information is also applied for the liability tables in the individual report of each antibody.

When a column contains multiple values inside brackets [], they are listed in the same order across all columns, meaning the first entry in each column corresponds to the same motif and liability. Empty [] and NaN indicate that no motif or corresponding value was detected for that column.