Skip to main content
crop-pal logo

About cropPAL2020

Summary

cropPAL2020 provides a powerful tool to investigate subcellular localisation in a growing number of crop species through the unification of disparate datasets and by the provision of web services through our accessible interface. Users can construct powerful queries or interrogate their protein sets resulting in a one-stop-shop for protein localisation and protein location relationships. The compendium of cropPAL2020 houses large scale proteomic (MS/MS), fluorescence protein (FP) localisation as well as Protein-Protein Interaction (PPI) inferred from Arabidopsis PPI experimentation. The compendium of cropPAL2020 also contains precompiled bioinformatic predictions for protein subcellular localisations and a consensus call (winner-takes-all location) taking predictive and experimental information into account. The cropPAL2020 search interface provides flexible options of refining or interrogating protein data sets by location, interactions, protein properties and bibliographic information. For bulk downloads of cropPAL releases please visit Research Data Australia

Why cropPAL? Subcellular localisation information can contribute towards our understanding of protein function, protein redundancy and biological relationships. While a variety of technologies are currently employed to determine the subcellular location of proteins, much of this information is not available in an integrated manner. In an attempt to get a clearer picture of existing experimental data and to generally understand subcellular partitioning we have brought together and expanded various data sources to build cropPAL2020. The database has a web accessible interface that allows advanced combinatorial queries on the data as well as downloads for downstream applications. All data is references and linked to the original source allowing reuse and dissemination of research data.

The resources in cropPAL2020

cropPAL2020 species annotation information

cropPAL2020 is updated about once a year, which means experimental and computed data increase with every update. The current version cropPAL2020 is built on the Ensembl Pantmart version 40 described in the Gramene release 58 notes. The version 2 of cropPAL expanded the curated experimental and precomputed data to 12 species. The data was linked to the proteome annotations listed below.

Species Assembly Gene annotaion
Arabidopsis thaliana TAIR10 2016-06-Araport11
Brassica napus AST_PRJEB5043_v1 2015-09-ENA
Brassica rapa IVFCAASv1 bra_v1.01_SP2010_01
Glycine max Glycine_max_v2.0 2015-11-ENA
Hordeum vulgare IBSC v2 IBSC_1.0
Musa acuminata MA1 2012-08-Cirad
Oryza sativa Japonica IRGSP-1.0 IRGSP-1.0
Solanum lycopersicum SL2.50 2014-10-EnsemblPlants
Solanum tuberosum SolTub_3.0 SolTub_3.0
Sorghum bicolor Sorghum_bicolor_NCBIv3 2017-06-ENA
Triticum aestivum IWGSC 2018-04-IWGSC
Vitis vinifera IGGP_12x 2012-07-CRIBI
Zea mays B73 RefGen_v4 CampbellMaker2016Dec

cropPAL2020 experimental data

An overview of the experimental studies in cropPAL2020 as of October 2019 is shown below. Studies published after this date will be included in the next update. The experimental data is captured and obsolete or non-Ensemble Plant protein IDs are either cross-references or sequences belonging to these IDs are retrieved and BLASTed against the current proteome. This helps retain valuable experimental data and links it to current genome standards. The table below shows the number of studies that have been linked to the Ensembl proteomes as well as the number of PPI studies that were derived from Arabidopsis through homology linking.

Species FP studies MSMS studies inferred PPI studies
Brassica napus 11 9 682
Brassica rapa 6 0 672
Glycine max 1 22 614
Hordeum vulgare 33 14 534
Musa acuminata 20 2 575
Oryza sativa 327 55 559
Solanum lycopersicum 40 18 592
Solanum tuberosum 13 8 534
Sorghum bicolor 5 2 574
Triticum aestivum 81 25 576
Vitis vinifera 2 7 619
Zea mays 89 26 570

The table below details the number of experimental localisations, the number of distinct proteins localised and available PPI pairs infered from Arabidopsis by homology linking for each species.

Species localisations (FP and MSMS) distinct proteins (FP and MSMS) PPI pairs
Brassica napus 478 452 160291
Brassica rapa 6 6 38498
Glycine max 13972 11072 79856
Hordeum vulgare 955 822 43428
Musa acuminata 190 190 83184
Oryza sativa 14489 10978 31327
Solanum lycopersicum 10307 8341 26042
Solanum tuberosum 1660 1602 24284
Sorghum bicolor 7 7 32443
Triticum aestivum 4907 4159 373597
Vitis vinifera 207 186 15125
Zea mays 14327 10544 72921

cropPAL2020 predictors

There are 12 predictors integrated into cropPAL which use distinct training data sets, input variables and prediction methods. These have been reviewed and compared for their contribution to the SUBA consensus call in our recent study about SUBAcon. Predictors vary in their accuracy for each subcellular compartment. Information describing the performance of individual predictors in specific subcellular compartments in Arabidopsis can be found on the SUBA4 about page . For 6 predictors (MultiLoc2, predotar, targetP, YLoc, iPSORT, WolfPSORT) proteome-wide prediction sets exists for all cropPAL2020 species. The remaining 6 predictors (BaCeLo, PProwler, ChloroP, EpiLOC, PTS1, Plant-mPLoc) were either not accessible or too slow to complete the proteome at time of release. Incomplete prediction sets and predictions from previous cropPAL versions for identical protein sequences are provided for these algorithms in cropPAL2020. Additional predictions will be added at the next update.


cropPAL2020 Winner-takes-all locations

The winner-takes-all (WTA) is a calculated output that attempts to unify prediction as well as available experimental data. This is different to a classifier that is based on a training set and tested performance. The WTA call is generated by counting up the predicted locations as 1 vote (e.g. 2 x mitochondrial predictions = 1 vote for mitochondrion) and adding any experimental verification as 1 vote each (e.g. 2 x plastidal GFP localisations = 2 votes for plastid). The votes are added up and the winner(s) is the exclusive location suggestion for this protein. In the example, the votes would result in a 2:1 plastidal output with a final call to plastid. This strategies ensures a stronger influence of experimental verifications when they are available. In case of several locations with equal votes as the winner all winners will be chosen. To avoid skewing of the calculations only complete proteome predictions data sets could be used for this calculation. This included the predictors: MultiLoc2, predotar, targetP, YLoc, iPSORT, WolfPSORT.


cropPAL2020 Protein-Protein-Interaction Data

Direct experimental evidence of Protein-Protein Interaction data for crop species is still sparse. For Arabidopsis, SUBA4 collated over 26000 experiments that showed the interaction of two or more proteins covering a third of the proteome. The similarity in protein sequence of Arabidopsis PPI pairs across species may help find PPI partners in crop species for hypothesis-driven research. In cropPAL2020, we have taken all evidence-based Arabidopsis PPI data and linked it to homologically similar proteins in crop species. This yielded over 800000 suggested PPI partners across the 12 crop species. The homology linking was performed using TreeBEST homology linking as available through Ensembl Plant Mart version 40.


cropPAL2 Homology Linking

There are two strategies in cropPAL2020 linking crop data to other species. One apporach that is used to determine the best match is a reciprocal BLAST of all crop proteins against Arabidopsis. This generates a link to the nearest match and description in SUBA4.
The second type of homology linking is performed using the Ensembl Plants gene tree as provided by Gramene. The tree was generated using TreeBeST and was described in more detail in 2016 in DataBase. The TreeBeST homology linking links all cropPAL2020 species to each other as well as to Arabidopsis. The gene tree is a more conservative measure than the reciprocal blast

Site Data

The data backing this website is fully accessible via the OpenAPI Specification (OAS). You can view and download data now using our own page here. Or you can inspect and edit the specification in the offsite Swagger editor.