Skip to main content
crop-pal logo

Subcellular Location of Plant-Derived Proteins

Global agriculture faces increasing demand for crop yield, higher amounts of and more diverse plant-derived protein. Protein subcellular location is a key element in determining protein function and accumulation patterns in plants and is critical for better harnessing plant energy for yield and plant defence for sustainability. The cropPAL2020 dataset provides a comprehensive subcellular proteomics resource and user interface for exploring global protein distributions within crop cells. It identifies species-specific protein subcellular location divergence and defines the best species for comparisons to drive compartmentation-based approaches to improve yield, protein composition and resilience in future crop varieties.

Subcellular location can be determined by fluorescent protein tagging or mass spectrometry detection in subcellular purifications as well as by prediction using protein sequence features. The compendium of crop Proteins with Annotated Locations (cropPAL) collates >800 studies performed by > 700 scientists in 45 countries around the world and computational data from 12 prediction algorithms. Crops included are banana (Musa acuminata), barley (Hordeum vulgare), canola (Brassica napus), field mustard (Brassica rapa), maize (Zea mays), potato (Solanum tuberosum), rice (Oryza sativa), sorghum (Sorghum bicolor), soybean (Glycine max), tomato (Solanum lycopersicum), wheat (Triticum aestivum), wine grape (Vitis Vinifera). The data collection including metadata for proteins and studies can be searched using the query builder below. The Homology tab functions allows the search for location data across all crop species as well as compares it to Arabidopsis data from SUBA4.

Find this resource useful? Please cite cropPAL (PubMed, Plant Cell Physiol).
Bulk downloads available at RDA
Previous verions of cropPAL: cropPAL1 (2015), cropPAL2 (2017)

Choose crops below then build a query with the questions below by pressing the → buttons.

Queries appear here....
... Organism is any of
... or try a quick text query:
Find proteins where the... (To start a query or add another filter to your query select a filter below and press the → button.)
Test for experimental (e.g. FP or MS/MS) evidence of subcellular location.
... experimental location inferred by to be in
Many proteins have no experimental evidence for their subcellular location. Check what the predictors think.
... predicted location inferred by to be in
Find proteins where the... (To start a query or add another filter to your query select a filter below and press the → button.)
Search for or exclude proteins by keywords. A search will be conducted against the descriptions of proteins in the CropPAL database. The syntax of this search supports extended regular expressions (see this site for more information). Choosing matches will give you access to the match syntax of MySQL, e.g. entering +leaf –seed* in the keyword(s) box matches a description that contains leaf but that does not contain seed, seeds, or seedling etc.
... protein description keyword(s)
Filter for proteins based on various numeric data derived from sequence data. GRAVY is defined in Kyte J., Doolittle R.F.: Mol. Biol. 157:105-132(1982) "A simple method for displaying the hydropathic character of a protein" doi:10.1016/0022-2836(82)90515-0. PMID: 7108955.
... physical property of is ← Should be a number
Filter for proteins that are translated from genes on a specific chromosome or assembly (or set of scaffolds!).
... gene model is on

Search for proteins that are (or are not) in a list of Identifiers. Enter this list of Identifiers into the box below. See here for a summary of known cross references.

You can use "wildcards" with "like" and "not like" e.g. GO:%.

... EnsemblPlants identifier(s), alias or protein sequence feature is the list of: (e.g. GO:0008270, IPR017986 etc.)
Find proteins with ... (To start a query or add another filter to your query select a filter below and press the → button.)
Search for or exclude proteins by keywords. A search will be conducted against the literature titles and abstracts in the SUBA database. The syntax of this search supports extended regular expressions (see this site for more information). Choosing matches will give you access to the match syntax of MySQL, e.g. entering +leaf –seed* in the keyword(s) box matches a title/abstract that contains leaf but that does not contain seed, seeds, or seedling etc.

Reciprocal Blast

... Arabidopsis orthologs with blast match score greater than ← must be a number and Arabidopsis consensus location in Subcellular Location:
Search for proteins that are homologous to proteins from another Crop in a certain subcellular location. The homology types included are orthologs (any gene pairwise relation where the ancestor node is a speciation event) and paralogs (any gene pairwise relations where the ancestor node is a duplication event). For more specific classifications displayed in the query results please refer to: EnsemblPlants Protein trees

EnsemblPlants Homology Tree

... any homology with identity greater than ← must be a number and homology type of organism type and has experimentally localized (by MS/MS or GFP) it in:
Find proteins where the... (To start a query or add another filter to your query select a filter below and press the → button.)
... author's name like
Select a paper by pubmed
Select a paper by author name
... year of publication of localisation studies is between and ← Should be a year
Search for or exclude proteins by keywords. A search will be conducted against the literature titles and abstracts in the SUBA database. The syntax of this search supports extended regular expressions (see this site for more information). Choosing matches will give you access to the match syntax of MySQL, e.g. entering +leaf –seed* in the keyword(s) box matches a title/abstract that contains leaf but that does not contain seed, seeds, or seedling etc.
... publication title or abstract of the localisation study the keyword(s)
... author's affiliation
... author's affiliation in
Search for proteins by author affiliation
... author's afflilation in
Find proteins where the... (To start a query or add another filter to your query select a filter below and press the → button.)
Search for proteins that match this blast. Press the ‘Clear’ button to delete any content from the box.
... Protein contains fragments in list with bit-score ...
Sequences against cropPAL 2020 CropPAL2v2 database

Bit Score is log2Neff-log2(E-value) where E-value = pval × Neff is the p-value times the effective search space size. The larger the bit-score the better since pval = P(random seq having a better score) = 2-(bit-score). The p-value measures the statistical significance of the match but since we tried Neff times to find a match we need to make a correction. Multiplying by the number of possible matches gives the e-value or the expected number of hits with a better match just by random chance. (See here and here [PDF]).