You are on CGS' Legacy Site.

    Thank you for visiting CGS! You are currently using CGS' legacy site, which is no longer supported. For up-to-date information, including publications purchasing and meeting information, please visit

    Jeffrey Engler and Julia Kent, Council of Graduate Schools

    The challenge of Big Data in graduate student research

    Large datasets present exciting new opportunities for the U.S. and global research enterprise. Indeed, “big data” approaches to research have the potential to develop new knowledge and innovations across nearly every broad field of study, particularly in the biomedical sciences, computer science, engineering, and the social sciences. Yet the methods used to assemble large datasets, and their applications in decision-making contexts, challenge existing ethical paradigms for data management, data integrity, human subject protections, and data use. In many fields, for example, aggregating data from different sources can make privacy protections for human subjects more complex, and raise questions about data ownership. In others, the use of algorithms and predictive analytics may lead researchers to influence—not simply predict—human behaviors. Unfortunately, current attempts to identify and address these challenges are often focused within specific disciplines or corporate settings and offer little opportunity to integrate these evolving ethical concerns within graduate programs preparing the next generation of researchers.

    The Graduate dean’s role in training in academic integrity

    Graduate deans often oversee professional development and RCR training curricula and are uniquely positioned to present the ethical concerns of big data research to their university communities and to bridge potential silos that impede the sharing of best practices to address these evolving challenges. To address this gap in graduate student preparation, the Council of Graduate Schools (CGS) and PERVADE (Pervasive Data Ethics for Computational Systems), embarked on a project to better understand the challenges and opportunities universities face in preparing graduate students in the ethical use of big data. Our goals were to identify both broad and specific ethical challenges that arise from the use of big data resources in graduate student research; to discuss and evaluate existing resources for training in the ethical use of big data; to identify potential levers for introducing and discussing these challenges, and for engaging Principal Investigators (PIs) and advisors in helping students prepare for them; and to formulate potential strategies for deploying and embedding resources for big data ethics within academic programs, professional development opportunities, and RCR training.

    Workshop on expanding graduate training in big data ethics

    With generous funding from the Office of Research Integrity (ORI) and Elsevier, CGS and PERVADE convened a diverse group of graduate education leaders around these topics. The virtual event, held in April 2021, brought together graduate deans, experts in the ethics of big data research, and representatives from disciplinary societies and other organizations. This report synthesizes lessons learned from this event with the goal of informing and strengthening efforts to prepare graduate students for the challenges of big data research.

    The five major conclusions and recommendations from this collective work are intended to stimulate further action and reflection in the research graduate education communities.

    Conclusions and Recommendations:

    1. The ethical challenges of research involving big data are relevant to a large population of master’s and doctoral students and should be broadly integrated into graduate research training. While big data methodologies are sometimes seen as a hot topic or novel innovation, they should not be relegated to specialized training programs or courses.
    2. The research and graduate education communities should evaluate current RCR curricula and ensure that they address challenges in big data. Research with large datasets is changing the way we need to teach several categories of RCR training recognized by the federal government, in particular, collaborative science; data acquisition, management, sharing and ownership; and human research protections.
    3. Plans to expand graduate research training to include ethical issues in big data research should include the participation of a broad range of stakeholders. Universities should include faculty, students, IRB review boards, Vice Provosts for Research, IT staff and others, from many different disciplinary training programs in their efforts. Students benefit when different groups on campus communicate and collaborate on a coordinated approach.
    4. Graduate deans, as the individuals with the broadest responsibility for the quality of graduate student research training, should play a lead role in supporting and facilitating institution-wide collaborations. The graduate dean community has a strong track record of supporting communication and collaboration across campus with the goal of improving and expanding student learning and professional development.
    5. Universities, organizations that support graduate education, and funders should increase their efforts to develop resources that prepare graduate students for the ethical challenges of research using large datasets. As resources such as curricula and case studies become available, these groups should work together to make them centrally available.

    All participants at the workshop agreed that continued discussion of this developing ethical concern should continue, to support graduate deans and other institutional leaders in expanding their efforts for ethical use of these research methods and databases. CGS will continue to provide updated resources on this developing area of research integrity training, as well as a forthcoming report, Preparing Graduate Students for the Ethical Challenges of Big Data.


    CGS is the leading source of information, data analysis, and trends in graduate education. Our benchmarking data help member institutions to assess performance in key areas, make informed decisions, and develop plans that are suited to their goals.
    CGS Best Practice initiatives address common challenges in graduate education by supporting institutional innovations and sharing effective practices with the graduate community. Our programs have provided millions of dollars of support for improvement and innovation projects at member institutions.
    As the national voice for graduate education, CGS serves as a resource on issues regarding graduate education, research, and scholarship. CGS collaborates with other national stakeholders to advance the graduate education community in the policy and advocacy arenas.  
    CGS is an authority on global trends in graduate education and a leader in the international graduate community. Our resources and meetings on global issues help members internationalize their campuses, develop sustainable collaborations, and prepare their students for a global future.