Deep Green Unannotated Protein Structures

The Deep Green list is based on the identification and curation of conserved unannotated proteins in three green lineage (Viridiplantae) model organisms; Arabidopsis thaliana, Chlamydomonas reinhardtii, and Setaria viridis. Preliminary characterization of Deep Green proteins and genes was done using various informatics tools and published data sets and is presented in Knoshaug, Sun, et al., 2023, submitted. The structures of these unannotated proteins were also predicted using AlphaFold (Jumper et al., 2021). The data deposited here are the AlphaFold structural predictions having the highest pLDDT score and thus identified as the best folded structure (ranked_0). These data enable others to do in-depth structural characterizations to aid in functional characterization leading to deeper understanding of plant biology.

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P. and Hassabis, D. (2021) Highly accurate protein structure prediction with AlphaFold. Nature, 596:583-589.

Knoshaug, E. P., Sun, P., Nag, A., Nguyen, H., Mattoon, E. M., Zhang, N., Liu, J., Chen, C., Cheng, J., Zhang, R., St. John, P., and Umen, J. (submitted) Identification and preliminary characterization of conserved uncharacterized proteins from Chlamydomonas reinhardtii, Arabidopsis thaliana, and Setaria viridis.
4 Data Resources
Name Size Type Resource Description History
Deep Green folded 118.65 MB Archive Predicted structures of the Deep Green protein set
A.thaliana_unannotated 154.13 MB Archive Predicted structures for Arabidopsis thaliana unannotated proteins
C.reinhardtii_unannotated 592.15 MB Archive Predicted structures for Chlamydomonas reinhardtii unannotated proteins
S.viridis_unannotated 367.09 MB Archive Predicted structures for Setaria viridis unannotated proteins
Author Information
Eric Knoshaug, Biosciences Center, ORCID iD: 0000-0002-5709-914X
Peipei Sun, Donald Danforth Plant Science Center, ORCID iD: 0000-0001-6448-4620
Ambarish Nag, Computational Science, ORCID iD: 0000-0001-5174-4673
Huong Nguyen, Donald Danforth Plant Science Center, ORCID iD: 0000-0002-6549-2216
Erin Mattoon, Donald Danforth Plant Science Center, ORCID iD: 0000-0002-8303-2440
Ningning Zhang, Donald Danforth Plant Science Center, ORCID iD: 0000-0003-0375-0248
Jian Liu, University of Missouri - Columbia, ORCID iD: 0000-0002-7570-8690
Chen Chen, University of Missouri - Columbia
Jianlin Cheng, University of Missouri - Columbia, ORCID iD: 0000-0003-0305-2853
Ru Zhang, Donald Danforth Plant Science Center, ORCID iD: 0000-0002-4860-7800
Peter St. John, NREL, now at NVIDIA Corp., ORCID iD: 0000-0002-7928-3722
James Umen, Donald Danforth Plant Sciences Center, ORCID iD: 0000-0003-4094-9045
Cite This Dataset
Knoshaug, Eric, Peipei Sun, Ambarish Nag, Huong Nguyen, Erin Mattoon, Ningning Zhang, Jian Liu, Chen Chen, Jianlin Cheng, Ru Zhang, Peter St. John, and James Umen. 2023. "Deep Green Unannotated Protein Structures." NREL Data Catalog. Golden, CO: National Renewable Energy Laboratory. Last updated: December 12, 2023. DOI: 10.7799/1970473.
About This Dataset
DOE Project
Deep Green: Structural and Functional Genomic Characterization of Conserved Unannotated Green Lineage Proteins
High Performance Computing Center (HPC)
Funding Organization
Department of Energy (DOE)
Sponsoring Organization
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
Research Areas
View License
Digital Object Identifier