CropPAL2020

About cropPAL²⁰²⁰

Summary

cropPAL²⁰²⁰ provides a powerful tool to investigate subcellular localisation in a growing number of crop species through the unification of disparate datasets and by the provision of web services through our accessible interface. Users can construct powerful queries or interrogate their protein sets resulting in a one-stop-shop for protein localisation and protein location relationships. The compendium of cropPAL²⁰²⁰ houses large scale proteomic (MS/MS), fluorescence protein (FP) localisation as well as Protein-Protein Interaction (PPI) inferred from Arabidopsis PPI experimentation. The compendium of cropPAL²⁰²⁰ also contains precompiled bioinformatic predictions for protein subcellular localisations and a consensus call (winner-takes-all location) taking predictive and experimental information into account. The cropPAL²⁰²⁰ search interface provides flexible options of refining or interrogating protein data sets by location, interactions, protein properties and bibliographic information. For bulk downloads of cropPAL releases please visit Research Data Australia

Why cropPAL? Subcellular localisation information can contribute towards our understanding of protein function, protein redundancy and biological relationships. While a variety of technologies are currently employed to determine the subcellular location of proteins, much of this information is not available in an integrated manner. In an attempt to get a clearer picture of existing experimental data and to generally understand subcellular partitioning we have brought together and expanded various data sources to build cropPAL²⁰²⁰. The database has a web accessible interface that allows advanced combinatorial queries on the data as well as downloads for downstream applications. All data is references and linked to the original source allowing reuse and dissemination of research data.

The resources in cropPAL²⁰²⁰

cropPAL²⁰²⁰ species annotation information

cropPAL²⁰²⁰ is updated about once a year, which means experimental and computed data increase with every update. The current version cropPAL²⁰²⁰ is built on the Ensembl Pantmart version 40 described in the Gramene release 58 notes. The version 2 of cropPAL expanded the curated experimental and precomputed data to 12 species. The data was linked to the proteome annotations listed below.

Species	Assembly	Gene annotaion
Arabidopsis thaliana	TAIR10	2016-06-Araport11
Brassica napus	AST_PRJEB5043_v1	2015-09-ENA
Brassica rapa	IVFCAASv1	bra_v1.01_SP2010_01
Glycine max	Glycine_max_v2.0	2015-11-ENA
Hordeum vulgare	IBSC v2	IBSC_1.0
Musa acuminata	MA1	2012-08-Cirad
Oryza sativa Japonica	IRGSP-1.0	IRGSP-1.0
Solanum lycopersicum	SL2.50	2014-10-EnsemblPlants
Solanum tuberosum	SolTub_3.0	SolTub_3.0
Sorghum bicolor	Sorghum_bicolor_NCBIv3	2017-06-ENA
Triticum aestivum	IWGSC	2018-04-IWGSC
Vitis vinifera	IGGP_12x	2012-07-CRIBI
Zea mays	B73 RefGen_v4	CampbellMaker2016Dec

cropPAL²⁰²⁰ experimental data

An overview of the experimental studies in cropPAL²⁰²⁰ as of October 2019 is shown below. Studies published after this date will be included in the next update. The experimental data is captured and obsolete or non-Ensemble Plant protein IDs are either cross-references or sequences belonging to these IDs are retrieved and BLASTed against the current proteome. This helps retain valuable experimental data and links it to current genome standards. The table below shows the number of studies that have been linked to the Ensembl proteomes as well as the number of PPI studies that were derived from Arabidopsis through homology linking.

Species	FP studies	MSMS studies	inferred PPI studies
Brassica napus	11	9	682
Brassica rapa	6	0	672
Glycine max	1	22	614
Hordeum vulgare	33	14	534
Musa acuminata	20	2	575
Oryza sativa	327	55	559
Solanum lycopersicum	40	18	592
Solanum tuberosum	13	8	534
Sorghum bicolor	5	2	574
Triticum aestivum	81	25	576
Vitis vinifera	2	7	619
Zea mays	89	26	570

The table below details the number of experimental localisations, the number of distinct proteins localised and available PPI pairs infered from Arabidopsis by homology linking for each species.

Species	localisations (FP and MSMS)	distinct proteins (FP and MSMS)	PPI pairs
Brassica napus	478	452	160291
Brassica rapa	6	6	38498
Glycine max	13972	11072	79856
Hordeum vulgare	955	822	43428
Musa acuminata	190	190	83184
Oryza sativa	14489	10978	31327
Solanum lycopersicum	10307	8341	26042
Solanum tuberosum	1660	1602	24284
Sorghum bicolor	7	7	32443
Triticum aestivum	4907	4159	373597
Vitis vinifera	207	186	15125
Zea mays	14327	10544	72921

cropPAL²⁰²⁰ predictors

There are 12 predictors integrated into cropPAL which use distinct training data sets, input variables and prediction methods. These have been reviewed and compared for their contribution to the SUBA consensus call in our recent study about SUBAcon. Predictors vary in their accuracy for each subcellular compartment. Information describing the performance of individual predictors in specific subcellular compartments in Arabidopsis can be found on the SUBA4 about page . For 6 predictors (MultiLoc2, predotar, targetP, YLoc, iPSORT, WolfPSORT) proteome-wide prediction sets exists for all cropPAL²⁰²⁰ species. The remaining 6 predictors (BaCeLo, PProwler, ChloroP, EpiLOC, PTS1, Plant-mPLoc) were either not accessible or too slow to complete the proteome at time of release. Incomplete prediction sets and predictions from previous cropPAL versions for identical protein sequences are provided for these algorithms in cropPAL²⁰²⁰. Additional predictions will be added at the next update.

cropPAL²⁰²⁰ Winner-takes-all locations

The winner-takes-all (WTA) is a calculated output that attempts to unify prediction as well as available experimental data. This is different to a classifier that is based on a training set and tested performance. The WTA call is generated by counting up the predicted locations as 1 vote (e.g. 2 x mitochondrial predictions = 1 vote for mitochondrion) and adding any experimental verification as 1 vote each (e.g. 2 x plastidal GFP localisations = 2 votes for plastid). The votes are added up and the winner(s) is the exclusive location suggestion for this protein. In the example, the votes would result in a 2:1 plastidal output with a final call to plastid. This strategies ensures a stronger influence of experimental verifications when they are available. In case of several locations with equal votes as the winner all winners will be chosen. To avoid skewing of the calculations only complete proteome predictions data sets could be used for this calculation. This included the predictors: MultiLoc2, predotar, targetP, YLoc, iPSORT, WolfPSORT.

cropPAL²⁰²⁰ Protein-Protein-Interaction Data

Direct experimental evidence of Protein-Protein Interaction data for crop species is still sparse. For Arabidopsis, SUBA4 collated over 26000 experiments that showed the interaction of two or more proteins covering a third of the proteome. The similarity in protein sequence of Arabidopsis PPI pairs across species may help find PPI partners in crop species for hypothesis-driven research. In cropPAL²⁰²⁰, we have taken all evidence-based Arabidopsis PPI data and linked it to homologically similar proteins in crop species. This yielded over 800000 suggested PPI partners across the 12 crop species. The homology linking was performed using TreeBEST homology linking as available through Ensembl Plant Mart version 40.

cropPAL2 Homology Linking

There are two strategies in cropPAL²⁰²⁰ linking crop data to other species. One apporach that is used to determine the best match is a reciprocal BLAST of all crop proteins against Arabidopsis. This generates a link to the nearest match and description in SUBA4.
The second type of homology linking is performed using the Ensembl Plants gene tree as provided by Gramene. The tree was generated using TreeBeST and was described in more detail in 2016 in DataBase. The TreeBeST homology linking links all cropPAL²⁰²⁰ species to each other as well as to Arabidopsis. The gene tree is a more conservative measure than the reciprocal blast

Site Data

The data backing this website is fully accessible via the OpenAPI Specification (OAS). You can view and download data now using our own page here. Or you can inspect and edit the specification in the offsite Swagger editor.

About cropPAL2020

Summary

The resources in cropPAL2020

cropPAL2020 species annotation information

cropPAL2020 experimental data

cropPAL2020 predictors

cropPAL2020 Winner-takes-all locations

cropPAL2020 Protein-Protein-Interaction Data