About cropPAL2020
Summary
cropPAL2020 provides a powerful tool to investigate subcellular localisation in a growing number of crop species through the unification of disparate datasets and by the provision of web services through our accessible interface. Users can construct powerful queries or interrogate their protein sets resulting in a one-stop-shop for protein localisation and protein location relationships. The compendium of cropPAL2020 houses large scale proteomic (MS/MS), fluorescence protein (FP) localisation as well as Protein-Protein Interaction (PPI) inferred from Arabidopsis PPI experimentation. The compendium of cropPAL2020 also contains precompiled bioinformatic predictions for protein subcellular localisations and a consensus call (winner-takes-all location) taking predictive and experimental information into account. The cropPAL2020 search interface provides flexible options of refining or interrogating protein data sets by location, interactions, protein properties and bibliographic information. For bulk downloads of cropPAL releases please visit Research Data Australia
Why cropPAL? Subcellular localisation information can contribute towards our understanding of protein function, protein redundancy and biological relationships. While a variety of technologies are currently employed to determine the subcellular location of proteins, much of this information is not available in an integrated manner. In an attempt to get a clearer picture of existing experimental data and to generally understand subcellular partitioning we have brought together and expanded various data sources to build cropPAL2020. The database has a web accessible interface that allows advanced combinatorial queries on the data as well as downloads for downstream applications. All data is references and linked to the original source allowing reuse and dissemination of research data.
The resources in cropPAL2020
cropPAL2020 species annotation information
cropPAL2020 is updated about once a year, which means experimental and computed data increase with every update. The current version cropPAL2020 is built on the Ensembl Pantmart version 40 described in the Gramene release 58 notes. The version 2 of cropPAL expanded the curated experimental and precomputed data to 12 species. The data was linked to the proteome annotations listed below.
Species | Assembly | Gene annotaion |
---|---|---|
Arabidopsis thaliana | TAIR10 | 2016-06-Araport11 |
Brassica napus | AST_PRJEB5043_v1 | 2015-09-ENA |
Brassica rapa | IVFCAASv1 | bra_v1.01_SP2010_01 |
Glycine max | Glycine_max_v2.0 | 2015-11-ENA |
Hordeum vulgare | IBSC v2 | IBSC_1.0 |
Musa acuminata | MA1 | 2012-08-Cirad |
Oryza sativa Japonica | IRGSP-1.0 | IRGSP-1.0 |
Solanum lycopersicum | SL2.50 | 2014-10-EnsemblPlants |
Solanum tuberosum | SolTub_3.0 | SolTub_3.0 |
Sorghum bicolor | Sorghum_bicolor_NCBIv3 | 2017-06-ENA |
Triticum aestivum | IWGSC | 2018-04-IWGSC |
Vitis vinifera | IGGP_12x | 2012-07-CRIBI |
Zea mays | B73 RefGen_v4 | CampbellMaker2016Dec |
cropPAL2020 experimental data
An overview of the experimental studies in cropPAL2020 as of October 2019 is shown below. Studies published after this date will be included in the next update. The experimental data is captured and obsolete or non-Ensemble Plant protein IDs are either cross-references or sequences belonging to these IDs are retrieved and BLASTed against the current proteome. This helps retain valuable experimental data and links it to current genome standards. The table below shows the number of studies that have been linked to the Ensembl proteomes as well as the number of PPI studies that were derived from Arabidopsis through homology linking.
Species | FP studies | MSMS studies | inferred PPI studies |
---|---|---|---|
Brassica napus | 11 | 9 | 682 |
Brassica rapa | 6 | 0 | 672 |
Glycine max | 1 | 22 | 614 |
Hordeum vulgare | 33 | 14 | 534 |
Musa acuminata | 20 | 2 | 575 |
Oryza sativa | 327 | 55 | 559 |
Solanum lycopersicum | 40 | 18 | 592 |
Solanum tuberosum | 13 | 8 | 534 |
Sorghum bicolor | 5 | 2 | 574 |
Triticum aestivum | 81 | 25 | 576 |
Vitis vinifera | 2 | 7 | 619 |
Zea mays | 89 | 26 | 570 |
The table below details the number of experimental localisations, the number of distinct proteins localised and available PPI pairs infered from Arabidopsis by homology linking for each species.
Species | localisations (FP and MSMS) | distinct proteins (FP and MSMS) | PPI pairs |
---|---|---|---|
Brassica napus | 478 | 452 | 160291 |
Brassica rapa | 6 | 6 | 38498 |
Glycine max | 13972 | 11072 | 79856 |
Hordeum vulgare | 955 | 822 | 43428 |
Musa acuminata | 190 | 190 | 83184 |
Oryza sativa | 14489 | 10978 | 31327 |
Solanum lycopersicum | 10307 | 8341 | 26042 |
Solanum tuberosum | 1660 | 1602 | 24284 |
Sorghum bicolor | 7 | 7 | 32443 |
Triticum aestivum | 4907 | 4159 | 373597 |
Vitis vinifera | 207 | 186 | 15125 |
Zea mays | 14327 | 10544 | 72921 |
cropPAL2020 predictors
There are 12 predictors integrated into cropPAL which use distinct training data sets, input variables and prediction methods. These have been reviewed and compared for their contribution to the SUBA consensus call in our recent study about SUBAcon. Predictors vary in their accuracy for each subcellular compartment. Information describing the performance of individual predictors in specific subcellular compartments in Arabidopsis can be found on the SUBA4 about page . For 6 predictors (MultiLoc2, predotar, targetP, YLoc, iPSORT, WolfPSORT) proteome-wide prediction sets exists for all cropPAL2020 species. The remaining 6 predictors (BaCeLo, PProwler, ChloroP, EpiLOC, PTS1, Plant-mPLoc) were either not accessible or too slow to complete the proteome at time of release. Incomplete prediction sets and predictions from previous cropPAL versions for identical protein sequences are provided for these algorithms in cropPAL2020. Additional predictions will be added at the next update.
cropPAL2020 Winner-takes-all locations
The winner-takes-all (WTA) is a calculated output that attempts to unify prediction as well as available experimental data. This is different to a classifier that is based on a training set and tested performance. The WTA call is generated by counting up the predicted locations as 1 vote (e.g. 2 x mitochondrial predictions = 1 vote for mitochondrion) and adding any experimental verification as 1 vote each (e.g. 2 x plastidal GFP localisations = 2 votes for plastid). The votes are added up and the winner(s) is the exclusive location suggestion for this protein. In the example, the votes would result in a 2:1 plastidal output with a final call to plastid. This strategies ensures a stronger influence of experimental verifications when they are available. In case of several locations with equal votes as the winner all winners will be chosen. To avoid skewing of the calculations only complete proteome predictions data sets could be used for this calculation. This included the predictors: MultiLoc2, predotar, targetP, YLoc, iPSORT, WolfPSORT.
cropPAL2020 Protein-Protein-Interaction Data
Direct experimental evidence of Protein-Protein Interaction data for crop species is still sparse. For Arabidopsis, SUBA4 collated over 26000 experiments that showed the interaction of two or more proteins covering a third of the proteome. The similarity in protein sequence of Arabidopsis PPI pairs across species may help find PPI partners in crop species for hypothesis-driven research. In cropPAL2020, we have taken all evidence-based Arabidopsis PPI data and linked it to homologically similar proteins in crop species. This yielded over 800000 suggested PPI partners across the 12 crop species. The homology linking was performed using TreeBEST homology linking as available through Ensembl Plant Mart version 40.
cropPAL2 Homology Linking
There are two strategies in cropPAL2020 linking crop data to other species. One apporach that is used to
determine the best match is a reciprocal BLAST of all crop proteins against Arabidopsis. This generates a
link to the nearest match and description in SUBA4.
The second type of homology linking is performed using the Ensembl Plants gene tree as provided by
Gramene. The tree was generated using
TreeBeST and was described in more detail in 2016 in DataBase. The TreeBeST homology linking links all cropPAL2020 species to each other as well as
to Arabidopsis. The gene tree is a more conservative measure than the reciprocal blast
Site Data
The data backing this website is fully accessible via the OpenAPI Specification (OAS). You can view and download data now using our own page here. Or you can inspect and edit the specification in the offsite Swagger editor.