Read to learn more about NREL's public data submission requirements and process. If your question is not answered here, please contact us.
Data Submission Requirements
Why should I submit NREL data?
Registering certain datasets with the Department of Energy is required by law. When you submit your data you increase public access to the results of federally funded scientific research, as outlined in the DOE Public Access Plan (PDF). Making the results of federally funded research publicly available is important for the advancement of science. It increases the visibility of NREL and our research, and as a result, increases opportunities for new collaborations and partnerships.
Who needs to submit data and metadata?
The principal investigator (PI) of a funded project has the responsibility for data submission, although the PI may delegate this responsibility to another researcher or analyst.
The Data Catalog application provides a means to enter contact information and prepopulates the fields as a convenience. These may be adjusted by the user to allow data consumers a method to ask questions about the dataset.
What is an ORCID iD?
ORCID is an open, non-profit, community-based effort to provide a registry of unique researcher identifiers and a transparent method of linking research activities and outputs to these identifiers. ORCID is unique in its ability to reach across disciplines, research sectors and national boundaries and its cooperation with other identifier systems.
For example, if there are two authors with the same initials, John Smith and Smith, J—one is at NREL and the other is at Colorado School of Mines. The correct authors can get credit for their work by recording their respective ORCID iDs.
An external link for starting the ORCID registration process has been provided in the form. There is no cost associated with getting an ORCID iD. About ORCID.
When should I submit my data?
You should submit your data as soon as the PI considers that the data for your funded project is final. This can vary from project to project, and even within projects. The PI should work with the data stewards to determine at which stage of the research lifecycle specific datasets will be submitted and made publicly available.
If data is referenced in a technical report then they should be published concurrently.
What Data to Submit and Not Submit
What kinds of data should I submit?
You should submit data that is created at, or for, NREL using federal funds with the exceptions that: they are not proprietary, they do not contain any Personally Identifiable Information (PII), and they do not contain any classified information.
According to the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy, research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples). Learn more from the Office of Science's Statement on Digital Data Management or see What should I not submit? below.
What do I enter in the Organization and Funding Organization fields?
The Organization field is for the lead-funded organization for the project; the default is National Renewable Energy Laboratory.
The Funding Organization field has drop down choices - if research is DOE funded, further information is required. This information will disclosed in your award or contract. For assistance, contact your NREL Laboratory Program Manager (Lead Project Manager).
What should I not submit?
Any personally identifiable information, business proprietary information, or copyrighted material should NOT be submitted.
Personally Identifiable Information (PII) is any piece of information or combination of pieces that could be used to compromise the identity of an individual. A person's name alone is not considered PII, especially in the case of attribution.
Contact information, such as personal email and home addresses, should not appear on any submitted data. A submitter's contact information is required but will only be used for questions about the data submission. Contact information for organizations is also acceptable, including office email, the office address, coordinates, and phone and fax numbers.
Personal information such as home telephone numbers, email and home addresses, and birth dates, is not allowed. Furthermore, private information such as social security numbers, bank account numbers, passport and driver's license numbers, is expressly forbidden. All submissions should be purged of PII prior to submission.
Business Proprietary Information (limited rights) should also not be included in the data submitted, as all data submitted will eventually be made available to the public. Data subject to copyright, business arrangement, publication or purchase agreement should not be uploaded to the Data Catalog.
Copyrighted Material of any kind, including journal articles, should not be uploaded to the catalog. When publicly available elsewhere, these data can, however, be linked to, if permanently hosted on other sites, using the Add Link button.
How to Submit Data
To submit data, Select Login > Select My Datasets > Click the Add Dataset button.
How should I organize my submission?
Data can be submitted as a single, consolidated submission or in multiple submissions. An individual submission can contain an unlimited number of data resources (files and links), but each resource must have a unique name within the submission. Submissions should be grouped into logical sets, associating like data together so that elements necessary for the comprehension of a resource are not in a different submission. If needed, a previous submission may always be linked to from a newer submission as one of its resources.
Combining resources by zipping or archiving should be done when the resources are of little use individually. For example, the zipping of individual shapefile components into a single shapefile resource is strongly encouraged. Consider how users will interact with your data before combining data into an archive.
How do I describe my data?
Describe your data in ways that will allow your colleagues, clients, sponsors, and others to easily find your data when searching the catalog. Consider these questions when describing your data:
- Who created the data resource?
- What is in the data file(s)?
- When, where, why and how the data was captured/collected?
- What would someone need to know to use the data properly?
Use keyword descriptors as though you were doing a search for information that would aid you in finding this dataset. Keywords can be a unique representation of locations, abbreviations or anything that describes your data. The application performs full-text searches, as does the DOE Data Explorer, so limit keywords relevant topical categories.
A location helps makes your data more discoverable. If you are searching for data in a specific region, Google and other search engines prioritize the dataset if a location is mentioned over those that do not have one.
What do I do if my file is too large?
Files over 3 GB are not prevented from loading. The application provides the ability to register links to external file services and other websites. The choice of a location to host data is a complex decision.
How do I link to a file on another site?
Simply link to the file using the field provided after clicking the Add Link button. The link you submit must be a permanent URI (i.e. a URL that leads directly to a resource and does not pass through a search page or require more than one click to navigate to the data).
Examples of good, permanent URLs
- http://goodsite.com/conference/paper-13.pdf
- http://goodsite.com/the+title+of+the+paper.pdf
Examples of bad, temporary URLs
- http://badsite.com/search?conference=WorldScience&paper=13
- http://badsite.com/node/13
When will my data be available for download?
The data will be available as soon as it has completed the data curation process, which typically takes less than two weeks. A digital object identifier (DOI) will be available after the dataset has been approved by DOE's Office of Scientific and Technical Information. For an update on your data, please contact us.
When can I modify an existing submission?
A dataset may be modified until the point that it is made publicly available. After this step, you should create a new submission with a reference to your original submission included in the description. Include a link to your original submission using the Add Link button in the new submission.
What if only some of my files are subject to moratorium?
Moratoriums apply to entire submissions. All accompanying files will be subject to the moratorium. To expose select files at different times, they must be in separate submissions.
Can I make changes to my submission?
While data is in "awaiting curation" phase, you can make changes or add more information, but you cannot add more files. However, once it goes into "curation" mode, it is locked down. The curator will not edit your data, but will review to be sure it aligns with the description. If there is something wrong, the curator will contact you.
Changes cannot be made after the data has been made publicly available. If this happens, you will need to create a new dataset and link it to the original dataset with an explanation about what has changed. If you need help, contact us.
Where should I house my data files?
This question of where data should go is complicated by legal, aesthetic, political, and technical considerations, and so there is no simple answer.
NREL is required by law to make certain data publicly available and machine readable by registering the datasets at OSTI. As NREL's STIP representative, Rob Finger has sole authority to register or delegate registry of this information for the enterprise.
To that end, we have built the Data Catalog. The Data Catalog's primary purpose is to register datasets, not house them. We built in three major features that provide value back to the users of the application. These are, an automatically generated landing page, convenient repository storage for files smaller than 3 GB, and (eventually) integrated usage metrics through file download reporting.
Although Principal Investigators are required to register datasets through it, the Data Catalog has limitations that may lead them to house their data separately. First, there is the file size limitation. File storage was added as a convenience for small projects without the budget or the need to create custom landing pages. It is backed by a full content management system. To prevent abuse, the file size is capped at 3 GB, so this eliminates a small but growing number of projects.
Another technical consideration in determining data housing is the stability of the data. OSTI places strict limitations on changes to the datasets that are often incompatible with research needs. If data are changing or growing, then the Data Catalog is not the right place to store data because it would require resubmitting records and because it would require costly and complicated features that are out of scope of the primary mission. The Data Catalog provides the ability to create links to data to allow the flexibility to register datasets without managing the data within the app.
While the Data Catalog automatically generates a landing page, it does not allow branding or flexible inclusion of content. It is bland and generic by design. Users with a budget and a need for better marketing will need to find alternatives to meet their needs. This will increase with NREL's growing number of institutional collaborations. The location of data can become a matter of contention when working with partners. Communications representatives can provide services that add value by creating attractive, branded products that increase NREL's prestige and credibility: this should not be discounted. Strict uniformity of presentation harms us.
There are a number of criteria that determine suitability for inclusion in a dataset. The projects must be supported with DOE funding to receive full treatment, for instance. If NREL is the primary awardee then we are required to register the data, but may also register it if we are not. Data supporting charts, tables, and figures in peer reviewed journal articles are required by law to be publicly available and machine readable. Also, enough data should be provided to verify the research. A well-designed data management plan will lower the cost of providing this level of detail.
All of these issues make for an effort that is difficult to manage, requires cooperation and coordination within Communications and with other centers, and requires that personnel understand the requirements and the issues. It is not wise to force users through restrictive requirements around the location and presentation of their data. Feel free to seek advice when you are making complicated decisions about your public data to reflect the best interests of the enterprise.
Metadata and Data Curation
What is metadata and why do I need to submit it?
Metadata refers to data and information that describe other data. Metadata summarizes your data so that others can easily find and work with them. Many of the metadata fields requested are required to meet data management guidelines from DOE, GSA, and other government agencies. These requirements are designed to promote the discovery of your data, increase their exposure to the scientific community, and enable their proper use.
What is data curation?
Data curation is the process of checking artifacts for prohibited information, such as PII, ensuring that files are not corrupt, and verifying that metadata meet basic standards.
If you have additional questions on the submission process, please contact us.