This FAQ is a companion to the Enabling FAIR Data Commitment Statement and the Author Guidelines being implemented by publishers who are signatories. (“FAIR” is defined as findable, accessible, interoperable, and reusable in the FAIR Guiding Principles.) This FAQ is not meant to contradict anything stated in either the Commitment Statement or the Author Guidelines. Instead, it is meant to help provide answers to common questions.

 

1. Selecting a Repository

Q 1.1. Why is a domain repository that supports the FAIR Data principles the best place for my data? Why not my lab, laptop, thumb drive, Dropbox, personal website, or my department’s supported website? 

A. Data repositories provide certain core services that help maintain scientific data over time and support data citation, discovery, quality, and reuse. Domain repositories, in particular, provide knowledgeable quality control for data on certain topics and are connected with the research community for developing documentation standards and methodology. Providing them with your data helps support the development of those disciplines and allows integration of closely related data sets. Some more advanced domain repositories also provide specialized online tools for extraction, visualization, workflow documentation, and fusion that help make your data reusable, and because your data have a persistent identifier (e.g., a Digital Object Identifier), citation and knowledge of your work will be increased.

Q 1.2. How do I locate or select a FAIR-aligned repository?

A. A good repository will help make your data more valuable for current and future research. It will help you organize and deposit your data and provide a means for (and help) the data to be cited, discovered, and preserved over the longer term. Further attributes are listed in the answer to Q 1.3.

There are several resources for finding a good repository for your data. First, be aware of the main attributes of a FAIR-aligned repository (see Q 1.3). Primary resources to help you find a repository include

  • The Repository Finder tool, developed by DataCite for the Enabling FAIR Data project. This tool uses the content of re3data.org, a registry of repositories, to allow you to search by topic and lists repositories that currently are accepting data to support publication, including those that are certified and support the FAIR principles. Some repositories are restricted in the types and sources of data they can accept (e.g., only from some funders or agencies). DataCite and re3data.org are working with repositories to make sure information is current. The finder tool also provides repository contact information for questions or assistance. For more information, check out this DataCite blog post.
  • Your institutional librarian. Many universities are supporting research data management on campus, and such services are often provided through the library. Librarians can be an excellent source of research data management support, including repository selection, and can help you comply with university requirements.
  • Your computing center. Many High Performance Computers (HPC) have infrastructure to support research using models and simulations, which may be involved generating and/or analyzing high volume data. The operations team at the center may have recommendations for data management, storage and preservation.
  • Your funders. Funders may have requirements on repository use for data acquired under their programs, or other stipulations on metadata, documentation, access, etc. Contact your funding program representative to ensure that you are meeting all requirements.

New tools are coming online every day. For instance, FAIRSharing.org is a “curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies.”

Q 1.3. What are the criteria to consider when selecting a repository for my data, software, or other digital research products? 

A. Researchers are encouraged to find and deposit data in leading “domain” repositories, especially those that are FAIR-aligned, whenever possible. A “domain” repository (i.e., a repository that hosts data from a specific discipline or subdiscipline) will usually host specific types of data and have expertise in curating and making these data interoperable for that discipline. As a result, leading domain repositories help maintain data quality, provide a level of peer review for the data, and help data meet community standards to enable interoperability and reusability of your data.

In addition, consider the following when domain and/or FAIR-aligned repositories are not obviously available. Preference should go to repositories that

  • Provide a Digital Object Identifier (DOI) or other globally unique, permanent, and resolvable identifier for your data so that you and others can easily cite them.
  • Provide a landing page for your data or data sets so that the data are discoverable (e.g., by search engines and other repositories). Note that data in publication supplements are generally not discoverable; most supplements are not indexed or crawled by search engines and often are not captured by long-term e-archives.
  • Provide citation information on the landing page so that you and other researchers can easily cite the data. See the FORCE11 Data Citation principles for more details and ESIP Data Citation Guidelines for examples.
  • Have sustainable funding so that your data will persist and be accessible.
  • Provide transparent data license information so your data can be as open as possible and as closed as necessary. Examples of data licenses that support open and free use of your data include Creative Commons licenses, such as CC0 (public domain) and CC-BY 4.0 (free usage with attribution).
  • Are required by your funder.
  • Have an appropriate form of certification. Leading certifications at the time of this writing include CoreTrustSeal and ISO 16363 “Space data and information transfer systems–Audit and certification of trustworthy digital repositories.” A very few repositories are certified by ISO. The World Data System also certifies repositories through its membership application process (using CoreTrustSeal). Note that CoreTrustSeal accreditation will take some time to achieve, as these certifications are new and require additional funding. Many repositories that are intending to become CoreTrustSeal certified are signatories on the Commitment Statement.
  • Help you prepare your data for deposit. Provide instructional documents that are easy to understand. Provide a curation expert to answer questions and assist with creation of documentation and metadata.
  • Are aligned with the research domain(s) and domain best practices associated with your data type.
  • Provide an embargo process to allow editors and reviewers of any associated manuscript to access the data during peer review (if required by your publisher).
  • Facilitate data discovery. Repositories should enable discovery of data through at least one search feature and be aligned with other services that aggregate and index information across such repositories as GoogleDataOne, Global Change Master Directory, data.gov, or others.

Some of these attributes can be verified in the Repository Finder tool (see Q 1.2) or in the tool’s “about” information. However, many good repositories may not yet have all these attributes.

Q 1.4. What do you do once you have found a repository or have a list of possible repositories? 

A. See Q 1.2 and Q 1.3 for help in finding a repository. Once you have found a repository,

  • If your data are simple and you have worked with or contacted the repository, you may be able to simply upload your data. A good repository will have information on organizing your data and associated documentation (called metadata) that will enable reuse by others.
  • If you have questions or are unsure, contact the repository.
  • If you have a number of possible repositories, check with colleagues or your librarian, institution, computing center, funder, or journal to see whether they have a preferred choice.
  • Organize your data and information for deposition as requested or instructed by the repository.

Once you have deposited the data, include a proper citation to them in the reference section of your paper using the DOI of the data set (examples of data citations are here). Indicate in the Data Availability Statement that the data are archived and available in the <named> repository.

Q 1.5. What information should I have with me when contacting a repository?

A. Generally, you should have the basic citation information with you when you contact a repository: the names and affiliations of the authors, ideally their ORCID identifiers (or at least yours), funding information (some repositories take data only from certain funders or institutions), and whether the data are related to a pending publication. If you are not the owner of the data or have not been given permission to deposit the data, see Q 2.2.

Some other information a repository may ask for might include

  • What do the data comprise? How many files are there, and what are their sizes and formats?
  • Are you the legal owner of the data, and/or do you have the rights and permission to deposit the data?
  • Are the data linked to a publication? If so, is this for initial submission or at the time of acceptance so that an embargo may be needed?
  • Are there additional scripts that are associated with the data processing that need archiving?
  • Do the data include sensitive, restricted, or personally identifiable information? If so, the methods used to protect or deidentify data will be helpful. When in doubt, contact the repository for specifics.

Q 1.6. What if none of the repositories I found and reached out to are able to accept my data? What if I can’t find any repositories that will take my data?

A. We strongly recommend depositing data in domain repositories that are FAIR-aligned and have the CoreTrustSeal certification or are striving toward it (e.g., are signatories on the Enabling FAIR Data Commitment Statement). Such repositories exist for a large swath of the Earth, space, and environmental science data domains; there are leading repositories for climate data, paleoclimate data, geochemical data, oceanographic data, seismic and geospatial data, paleomagnetic data, and many more.

In some cases, however, these repositories may not be appropriate for your data or the repository may not be funded to take your data. In such cases, there are several general data repositories that follow leading practices. Unless you have extremely large or complex data, it is often possible to have your data accepted and uploaded easily. Lists and comparisons of general repositories are here and here. Check re3data.org for updated repository information.

Your institution may also provide a data repository that has many of the core attributes listed in Q 1.3. Check with your university librarian to understand the local options.

Q 1.7. When should I submit my data to a repository? 

A. For data associated with a publication, please see the Enabling FAIR Data Author Guidelines and your publisher’s guidelines for submitting data.

If possible, please submit your data before submitting your paper to a journal. This will allow reviewers to access your data in the form in which they will be published and allow the repository time for any data curation work. You can then also include a citation to the deposited data in the reference section of your paper. Many repositories will allow you to make updates or additions to the data, or create a new version, if it is necessary during the paper review process.

If it is not possible to submit your data before submitting your paper, the data should be submitted prior to manuscript acceptance so that a proper citation is included in the final publication, and the files should be provided (or access to the data made available) to the journal for peer review.

For the best support, work directly with your selected repository, preferably when writing your Data Management Plan (DMP) and prior to starting to collect your data. Repository staff can often help you set up your research with plans for the expected format and size of your data, documenting your data (i.e., metadata), and registering your data with a persistent identifier.

Q 1.8. What if the appropriate repository I found requires a fee for its services but my project or institution does not have funding for it?

A. Reliable funding is needed to sustain repository services, infrastructure, and data management experts. While most repositories do not currently charge a submission fee, the data community is exploring how to support the necessary repositories over time, and a fee is one viable business model. If you are not able to pay a submission fee, first check with the repository, as there may be a waiver for certain data or authors (e.g., those from developing nations). If not, please check other repositories including those in Q 1.6. In the United States, some funders have started allowing data work and submissions to be included in project budgets. Consider allocating resources in future proposals.

Q 1.9. My institution or funder requires me to deposit my data in a general repository or one that is not FAIR-aligned. I would rather put my data into a FAIR-aligned repository that is more specific to my field. What should I do?

A. In general, you should not receive two permanent identifiers or DOIs for your data. If your institutional repository is meant mostly for storage and assigns an internal tracking number only for file storage, then it is fine to also submit your data to a FAIR-aligned repository. Otherwise, please discuss the importance of FAIR with your institutional repository and funder.

It is possible for two or more repositories to enter into an understanding such that the data are stored in two or more places but there is one main repository for the authoritative data and a unique assignment of the DOI. This happens commonly with manuscripts where the DOI is assigned by the publisher yet many author copies of manuscripts are also housed in institutional repositories. In this case, the FAIR-aligned repository should be the primary caretaker of the data and you should indicate in the notes that a copy is available at another storage repository. Include the DOI from the authoritative FAIR-aligned repository in the metadata of the storage repository copy to avoid confusion.

2. Data Deposition and Sharing

Q 2.1. My research project is still ongoing and I have not yet fully explored all the data used in this publication. I’m generating other data as well. Making the data available now might invite others to exploit my data and thereby jeopardize my chances for future publication or funding. Am I still required to make the data available publicly? 

A. The requirement across the publishers in support of openness and the FAIR guidelines is that the data used, presented, and analyzed in the paper be made available at the time of publication in a FAIR-aligned repository. This requirement has been common across leading publishers for some time, in some cases for more than 20 years. This is meant to support the integrity of the publication, allow for readers and the public to fully assess the claims, and provide for future use and reuse of the data and results. Only the data used in the publication are required to be made public. Depositing the data (and citing them) will make them available for others to reuse, also with a citation. Thus, you will receive credit for making your data public and accessible. Funders are also often requiring open data access, and thus not sharing your data could harm your chances for future funding support.

Q 2.2. A colleague generously shared his or her data with me with the understanding that I would not make the data public. Considering that such a professional agreement has been made, do the data still need to be deposited? If they must be deposited, do I have the right to do so?

I have obtained or used data from a company without permission to share them. As I do not have a right to share or deposit the data, what do I do?

A. Researchers should plan for publication of results and the need to comply with requirements when entering into research agreements. These include patent applications as well. This is part of appropriate communication regarding private or industry collaboration on research projects. In a case of data ownership being held by another individual or organization, the researcher should explain the need for deposition and obtain permission to have the data deposited or, better yet, encourage separate deposition by the primary owners of the data so that the data can be appropriately cited and credited. Leading publishers no longer recognize or permit such agreements as a reason for not providing underlying data. If the data cannot be made available, publication is premature and should not be pursued.

In many cases, your university’s or organization’s legal department should be familiar with such requirements for publication and, on entering agreements, can provide much advice to protect you, members of your research team, and your students. In all cases, having agreements in writing is essential, and regular communication is helpful.

Some data covered by privacy laws and other regulations may be appropriately restricted or partially restricted. This would usually be specified in the Institutional Review Board (IRB) review of the grant or the grant process. Other data may be acquired by payment and/or available for purchase by others in a standard way. This is acceptable. For example, some data, data products, or software are commonly available for purchase. The means of access should be indicated in the Data Availability Statement (e.g., these data or software are available for purchase in this way). The key here is that a standard process should be specified for access, and this should be transparently reported. Journals will decide whether the access is standard and acceptable for their publication requirements. In general, one-off agreements that preclude access from other researchers or the public are not acceptable. Restriction on access for privacy reasons should have oversight from IRBs or independent institutions. Restrictions that effectively keep data hidden will usually be unacceptable. See Q 2.3 for standard exceptions.

Q 2.3. What are some examples of extenuating circumstances or data types that might qualify as good reasons not to make the data available publicly or where data can be available through restricted access?

A. Some acceptable restrictions on data access are

  • Location information or other unique data that may place vulnerable species or sites at risk (including endangered animals or plants, rare fossil or mineral locations, designated archaeological sites, and others).
  • Data containing personal information on human subjects. Confidentiality and the specific access restrictions in this case are usually specified by an Institutional Review Board or the grant. Some additional information is available here.
  • Where there are specific laws regarding access to certain sensitive data and the laws provide standard means for researchers to apply for access.
  • Where the data (or software) were purchased and access is available to others through the same process. For example, if a commercial software package or data set is used, the researchers can indicate where others can obtain similar access.

In these cases, authors should specify the restriction and the reason for it in the “data availability statement.” Authors should make expectations and requirements regarding data release clear in agreements with industry or third parties.

Q 2.4. Is the statement “data available on request from the authors” an acceptable alternative?

A. No. There are sufficient domain and general repositories, including institutional repositories, available for nearly all data sets. Many studies show that “data available on request from the authors” is a burden on authors and rapidly leads, in some cases, to lost or missing data. Submitting data to a FAIR-aligned repository provides secure and lasting access, quality, and a citation and credit for the researcher, among other benefits.

Q 2.5. Is it appropriate to share my data in the main text or the supplement of the research article?

A. In the past, data were often included as supplements to papers or as a table in the main text. The problem is that supplements are generally not indexed and have no or limited metadata such that the data are not discoverable (especially as a supplement) and may not meet standards for quality or format. Publishers were not always archiving data appropriately. Further, over time, data in these supplements often became locked in obsolete formats.

The publishers that have signed the FAIR Data Commitment Statement have agreed to discontinue supplements as the primary archive for data and use FAIR-aligned repositories instead. Supplements can still be used for descriptions of the project; details on methodology; or additional images, resources, or figures that are not primary data.

Q 2.6. Can I deposit data that were collected during the study but which were not described in this particular manuscript or article? I don’t plan to publish the analysis of the additional data elsewhere, but the data were collected according to the protocols described in the article and may be useful to other researchers.

A. Yes; indeed this is to be encouraged! Please indicate this in your deposition. You can still get a DOI and a citation for the data so that others who reuse your data can add the reference to their papers.

Q 2.7. My study analyzed data that are derived from data that can be downloaded from an existing repository. Can’t I just refer to the existing repository and include in the methods section a description of how I obtained and processed the data?

My data set is an aggregated collection of existing data from multiple sources. I can share this as a new (derived) data set, but how do I make sure that connections to the previous data are maintained? 

A. If your study compiles or uses preexisting data in a new analysis, you should, just like when referencing other papers, cite the existing data when used in the text and include full references to these data in the reference list.

If you have a new data compilation that includes previous data, you can deposit this new compilation as a new data set or data product related to your paper. The information about the new data product should include the existing data that were cited and included. Your repository can help you with including this information.

If your data are a subset of an existing data set, or the subsets of the data set are too large to store elsewhere, you should provide the methods used to extract the data, the version of the original data used, and the date of extraction so that other researchers can extract those data.

Please check with the repository, as they will provide advice on how to cite and reference extracted data.

Q 2.8. My data are heavily processed. The raw data collected at the instrument are corrected almost immediately through a series of standard routines, and the raw data have never been reported in the literature. The data that are typically reported are highly derived or may even be restricted to deviations from a reference data set (e.g., many genomes can be reported as a deviation from a standard reference). What should I do? What is required? 

A. One of the advantages of working with standard domain repositories is that they help define community standards on data reporting for each discipline. In general, publishers have agreed to follow the leading practices in data reporting by discipline, as represented by the usual derived data that are archived at leading repositories. These are not typically the raw data collected at an instrument. You should report the standard processes and software versions by which any derived data were produced and how instruments were calibrated. The data required to be deposited are those associated with the new results and analyses reported in the paper, usually as represented in figures. If average or summary data are presented in the paper, including image contours, the deposited data should be those that form the basis of those averages or summaries.

Many satellite, telescope, and sensor data have standard means for processing, correction, compression, and storage of raw or partially processed data. Unless your study uses a new routine for these, these steps or an appropriate reference can simply be cited as part of the methods section.

Q 2.9. My data are mostly model output or are derived summary figures based on numerous model runs and outputs. The collective output could easily exceed 1 TB. What am I required to deposit?

A. Please see section 3 below on how to cite and deposit software. If you are part of a modeling center that has a standard archiving plan for models and run output, please follow that plan (such centers are an acceptable repository). In general, the most important information to provide on models is the code (and version used) and unique configurations, any input parameters, run files, and a description of the overall run environment and parameter space tested. Representative output can also be deposited, ideally key data that form the figures included in the paper.

Q 2.10. My funder has indicated that my group or other similar groups using a community facility can have restricted access to a data set or data stream from an instrument for a period of time, including for a year or more. Can I hold these data and release them at the end of this time?

A. This is an agreement between you and your funder. When you or your group submit a paper, you choose to enter a separate additional agreement with a publisher regarding supporting the integrity of published research. This means that as part of that agreement, the data supporting the published result, but not other unrelated data, need to be made available at the time of publication as indicated in these guidelines. Funders support this transparency related to publication and providing these data in advance of this separate embargo period.

Q 2.11. We collect data on a weekly basis that should be shared as soon as they become available to be useful to the public and to researchers (e.g., recreational water quality data). Are there limitations on how frequently we can add new data to a repository? 

A. Several repositories allow data to be continuously added to a data set and provide a means to cite extracts or refer to a subset of the data in various ways. Please check with the specific repository for their recommendations on dynamic data submission, citation, documentation, etc., so that these data can be appropriately used and referenced by others.

Q 2.12. What if my data are multidisciplinary and are best deposited in two or more repositories? 

A. This can be acceptable to many journals. Work with your repositories to ensure the data are appropriately connected. The divided data will have separate DOIs, and the description of each data set should provide a reference to the paper and other data. It is also possible to have several data sets (with different identifiers) at the same repository.

Q 2.13. My data are huge, terabytes or petabytes in size. How can I share them?

A. At the time of this writing, data of this size are best stored on-site, as the data are too large to transmit and exceed the limits of most domain or general repositories. Check with your repository on any size limits. If you are regularly collecting or producing this much data, you should check with your funder, facility, or institution for an on-site archiving plan. If these data are model output, please see Q 2.9.

3. Data Availability Statement and Data Citation

Q 3.1. Why is a Data Availability Statement required?

A. The Data Availability Statement (DAS) is a short description added to paper submissions that describes the new data that are being placed into the record by the authors and how they can be obtained. It helps the reader to quickly identify the extent of new data and where to find them. This statement can also be easily identified in the paper so that a broader view of data availability across scholarly publishing is possible. The statement is also important as a public declaration by the authors that all relevant data are available to others or that can be used to clarify any restrictions on access to the data. Research has shown that including such a statement leads to improved access to the data and better and faster editorial processing.

The DAS should include: confirmation that the data underlying the paper exists, information on where the data can be found, persistent identifiers where available, licensing restrictions, and access requirements (e.g., registration and fees). Data included in the DAS should be cited in the reference list.

Q 3.2. Why is submitting data to a repository and adding a data citation to my references better than putting my data into the body of the manuscript or in the supplementary information?

A. Repositories actively manage data, ensuring that data discovery and access are maintained over time. Many repositories provide numerous other services to enable analysis, visualization, integration, etc. More generally, citing data is now a broad community standard endorsed by numerous publishers, societies, and institutions. As stated by the Enabling FAIR Data Commitment Statement, sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice and in theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse. Data citations allow credit to the provider of the data in the same way that citations to manuscripts provide credit to their authors.

Data citation also enables initiatives like Scholix (Framework for Scholarly Link Exchange) to provide information on existing links between scholarly literature and data via the query service ScholeXplorer, thus improving data findability.

Q 3.3. What data should be cited? Just my new data, or just the data that I have reused from others?

A. All data used in the publication should be cited in the references, just like you would reference another paper. The Data Availability Statement should indicate the location of all necessary data. Please follow the Force11 Data Citation Principles for all data citations.

Examples of data citations within the Earth, space, and environmental sciences can be found here, developed by ESIP’s Data Stewardship Cluster.

Q 3.4. What does “provenance” of data mean? 

A. The provenance of your data describes the individual steps taken to reach the final data. A description of this at a high level is usually included in the methods section of the paper. It should also be included in the description of the data when they are deposited in a repository. It is helpful to add a link and a citation to your paper in your data set description to close the loop. Many repositories now allow you to include the actual scripts that were used to process and analyze the data, or links to them, or even connect these directly to the data. In some cases, links may also be included in the metadata records. For example, with MODIS data, each data file is accompanied by a metadata file that contains identifiers for all the data products used as input and the software (including version information) used to create that data file.

4. Software Citation and Curation

Q 4.1. Why are software citations important?

A. Software citations, as with data citations, provide a means to reference software and provide credit to developers. A set of community principles has been developed for software citations by the FORCE11 Software Citations Working Group. These state, “Software should be considered a legitimate and citable product of research. Software citations should be accorded the same importance in the scholarly record as citations of other research products, such as publications and data; they should be included in the metadata of the citing work, for example in the reference list of a journal article, and should not be omitted or separated. Software should be cited on the same basis as any other research product such as a paper or a book, that is, authors should cite the appropriate set of software products just as they cite the appropriate set of papers.”

Q 4.2. What repository can I use for my software?

A. Any repository that stores software can be used, including ZenodoMendeley Data, and figshare, and institutional repositories such as Caltech’s, etc. These repositories will provide a globally unique identifier, such as a Digital Object Identifier (DOI), for the specific version of the software that is archived, along with metadata about the software. The globally unique identifier will resolve to a landing page that contains the metadata in human- and machine-readable form, and also includes a link to the archived software. Please also see Q 3.2.

Q 4.3. Is GitHub okay as a repository?

A. No. GitHub is not an archival repository; it is a social code development and sharing site. GitHub does not have software archiving as part of its mission, and it only stores limited metadata. On the other hand, GitHub is a good place to develop, discuss, and contribute to software.

GitHub itself suggests how to make your code citable via an easy-to-use link that can be created between a GitHub repository and Zenodo, and there is a similar guide for how to create a link between a GitHub repository and figshare. Using either of these methods will allow you to archive a version of software from GitHub in a repository, where the repository will provide you a DOI that you can ask people who use your software to make part of their citation.

Note that there are several third-party archival projects aiming to archive GitHub public repositories. These are referenced on GitHub here. Furthermore, one of these third-party projects, the Software Heritage Archive, is aiming to provide findability, citability, and long-term preservation in alignment with FAIR principles. Archiving software will preserve the specific version of the software that was used and provide a link to the software product in its development environment, such as GitHub. While these efforts are trying to fill in the software archival gap, remember to be proactive and specify plans to keep your software safe and citable.

5. Physical Samples

Q 5.1. How should I describe my physical samples and make them available?

A. Samples in the Earth, planetary, and environmental sciences can include drill cores, rocks, soil samples, mineral specimens, water, solid ice, air samples, fossils, meteorites, or others. Museums and other sample curation organizations have standard ways of describing their collections; working with the organization curating collections of samples like yours is a good place to start. More generally, a standard system including a globally unique, persistent identifier has been developed to describe diverse samples: the International Geo Sample Number (IGSN). IGSNs can be used for both larger samples (e.g., entire drill cores) and subsamples (e.g., grain-size fractions, splits/pieces, mineral separates, etc.) derived from these. In order to obtain IGSNs for your samples, you should register them by submitting a standard set of metadata to an IGSN Allocating Agent. The IGSN is linked to the standard descriptions and metadata. Use of the IGSN allows discovery and integration of data for samples reported in different publications and other online resources. IGSNs should be listed in manuscript data tables and descriptions and also in related data set metadata. Further information is available here.

It is best to create or reserve IGSNs at the start of a project when you are collecting samples, but they can be obtained later as well.

Q 5.2. Do I need to make my physical samples openly available?

A. Many Earth science samples are collected after considerable effort and may represent valuable, rare archives (e.g., rare fossils, ice cores, or planetary samples). Ideally, these are or should be curated at a public facility that provides standard protocols for curation and access to preserve a portion of key samples for future research. Many publishers require that at least vertebrate fossils be archived at public facilities and museums (not in private museums) so that they are reliably available for research. At the same time, many of these museums and facilities are restricted in how many and which samples they can accept and curate. Researchers are encouraged to check with their institutions for curation of research samples that are not included in these facilities.

6. Enabling FAIR Data Project Questions and additional resources

Q 6.1. What is the purpose of the Enabling FAIR Data project?

A. The Enabling FAIR Data project is described here. Its goal is to ensure that Earth, space, and environmental science research outputs, including data, software, and samples or standard information about them, are open, FAIR, and curated in trusted domain repositories whenever possible and that other links and information related to scholarly publications follow leading practices for transparency and information.

Q 6.2. How do I support FAIR data?

A. If you would like to help, you can support the Enabling FAIR Data project in the following ways:

  • Sign the Enabling FAIR Data Commitment Statement. Individual and organizational signatories are welcome.
  • Bring the statement and this project to the attention of your colleagues, department, institution, and funders.

Q 6.3. How is the policy going to be executed across all publishers?

A. The Earth, environmental, and space science publishers that have signed the Commitment Statement will adopt common guidelines, principles, and practices regarding sharing of data and software and reporting on samples. Their goal is to provide a common and standard experience and set of expectations for researchers, both to improve integrity in publishing by achieving FAIR data and to standardize the process for authors. The goal is to help authors collect and organize data starting from the point of collection. These goals are strongly supported by funders.

Q 6.4. How will the work and results from the project be grown and sustained over time?

A. Several of the signatories of the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS) and Enabling FAIR Data statements are continuing to work to improve practices in collaboration with many other groups across the sciences. We are also in conversation with groups representing other disciplines.