Subcellular Location of Plant-Derived Proteins
Global agriculture faces increasing demand for crop yield, higher amounts of and more diverse plant-derived protein. Protein subcellular location is a key element in determining protein function and accumulation patterns in plants and is critical for better harnessing plant energy for yield and plant defence for sustainability. The cropPAL2020 dataset provides a comprehensive subcellular proteomics resource and user interface for exploring global protein distributions within crop cells. It identifies species-specific protein subcellular location divergence and defines the best species for comparisons to drive compartmentation-based approaches to improve yield, protein composition and resilience in future crop varieties.
Subcellular location can be determined by fluorescent protein tagging or mass spectrometry detection in subcellular purifications as well as by prediction using protein sequence features. The compendium of crop Proteins with Annotated Locations (cropPAL) collates >800 studies performed by > 700 scientists in 45 countries around the world and computational data from 12 prediction algorithms. Crops included are banana (Musa acuminata), barley (Hordeum vulgare), canola (Brassica napus), field mustard (Brassica rapa), maize (Zea mays), potato (Solanum tuberosum), rice (Oryza sativa), sorghum (Sorghum bicolor), soybean (Glycine max), tomato (Solanum lycopersicum), wheat (Triticum aestivum), wine grape (Vitis Vinifera). The data collection including metadata for proteins and studies can be searched using the query builder below. The Homology tab functions allows the search for location data across all crop species as well as compares it to Arabidopsis data from SUBA4.
Find this resource useful? Please cite cropPAL (PubMed,
Plant Cell
Physiol).
Bulk downloads available at
RDA
Previous verions of cropPAL: cropPAL1 (2015), cropPAL2 (2017)
Choose crops below then build a query with the questions below by pressing the → buttons.
matcheswill give you access to the match syntax of MySQL, e.g. entering
+leaf –seed*in the keyword(s) box matches a description that contains leaf but that does not contain seed, seeds, or seedling etc.
Search for proteins that are (or are not) in a list of Identifiers. Enter this list of Identifiers into the box below. See here for a summary of known cross references.
You can use "wildcards" with "like" and "not like" e.g. GO:%
.
matcheswill give you access to the match syntax of MySQL, e.g. entering
+leaf –seed*in the keyword(s) box matches a title/abstract that contains leaf but that does not contain seed, seeds, or seedling etc.
Reciprocal Blast
... Arabidopsis orthologs with blast match score greater than ← must be a number and Arabidopsis consensus location in Subcellular Location:EnsemblPlants Homology Tree
... any homology with identity greater than ← must be a number and homology type of organism type and has experimentally localized (by MS/MS or GFP) it in:matcheswill give you access to the match syntax of MySQL, e.g. entering
+leaf –seed*in the keyword(s) box matches a title/abstract that contains leaf but that does not contain seed, seeds, or seedling etc.
Bit Score is log2Neff-log2(E-value)
where E-value = pval × Neff
is the p-value times the
effective search space size. The larger the bit-score the better since
pval = P(random seq having a better score) = 2-(bit-score)
. The p-value
measures the statistical significance of the match but since we tried Neff
times to find a match we need to make a correction. Multiplying by the number of possible matches
gives the e-value
or the expected number of hits with a better match just by random chance.
(See here and
here [PDF]).