Magazine: Features
ChatGPT for Next Generation Science Learning
This article pilots ChatGPT in tackling the most challenging part of science learning and found it successful in automation of assessment development, grading, learning guidance, and recommendation of learning materials.
ChatGPT for Next Generation Science Learning
Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition
The United States is at the forefront of a worldwide movement to reform science education, as outlined in A Framework for K-12 Science Education [1]. Framework posits science learning is most meaningful when students are actively engaged in utilizing science disciplinary core ideas (e.g., force and interaction) and crosscutting concepts (e.g., cause and effect) to solve scientific problems or design solutions. This new vision of science education presents a significant challenge in fully implementing the framework and the resultant Next Generation Science Standards (NGSS) [2], particularly in the area of assessment practices.
One of the main challenges is the need to develop new assessment tasks that align with the standards and the knowledge-in-use learning Framework and the NGSS promote, termed Next Generation Science Assessment (NGSA). Knowledge-in-use indicates the use of scientific knowledge to figure out problems and design solutions. Traditional standardized tests, which often rely on multiple-choice questions, may not be able to fully capture the depth and breadth of knowledge and skills required by NGSA. Additionally, the development of NGSA is labor intensive and needs to accurately reflect the knowledge-in-use learning that the standards promote. NGSA tasks are usually performance-based and require constructed responses, and it is time- and cost-consuming to grade them. Without timely grading, teachers are unable to use the information for instructional decision making. Customized learning support according to students' backgrounds and needs is critical for creating equitable and inclusive science learning. However, the need to evaluate and provide feedback based on student performance under the framework and the NGSS is challenging, as traditional methods of evaluating student performance may not fully capture the depth and breadth of knowledge and skills required by the standards. Furthermore, the lack of appropriate resources and materials to support the assessment practices is also a challenge.
To meet these challenges, it is critical to consult cutting-edge technology, such as artificial intelligence (AI). Recent developments in AI have shown great potential to tackle these challenges [3, 4], such as to assess students' investigation [5], scientific modeling [6], and argumentation [7]. This article specifically focuses on ChatGPT, a large AI language model released by OpenAI in late November 2022 [8], and its potential to address these challenges and promote next generation science learning.
Automatic Next Generation Science Learning via ChatGPT
ChatGPT is a type of AI that uses deep learning techniques to generate human-like text. The model is trained on a vast corpus of text data, allowing it to understand and respond to a wide range of natural language inputs. ChatGPT can be fine-tuned to perform specific tasks, such as question answering, language translation, and text generation. Its ability to understand and respond to natural language input makes it a versatile tool that can be applied to various fields, such as education [9]. ChatGPT has the potential to help solve many of the challenges related to assessment practices in the implementation of the framework and the NGSS.
Automatically generating NGSA tasks. ChatGPT can assist in automating the creation of assessment items. It is thus critical to examine whether ChatGPT can generate questions, prompts, and tasks that align with Framework and the NGSS. The capacity to do so can save educators time and effort in creating assessment items, and also increase the quality of the assessment items by ensuring they are aligned with the standards. To test this capacity, I first selected a performance expectation (MSPS2-4) from the NGSS: "Construct and present arguments using evidence to support the claim that gravitational interactions are attractive and depend on the masses of interacting objects." This performance expectation requires students to engage in scientific argumentation practice while using disciplinary core ideas—types of interactions and crosscutting concepts—systems and system models.
I developed a prompt for ChatGPT: "Develop an assessment task with phenomena that relate to middle school students' life to assess their performance expectation: Construct and present arguments using evidence to support the claim that gravitational interactions are attractive and depend on the masses of interacting objects." Figure 1 presents the assessment item generated by ChatGPT, which includes the title, directions, and evaluation criteria. The task is related to the solar system phenomena that students are familiar with. The directions section presents a scenario and asks students to assume they are a scientist to study the behavior of gravitational integrations in the solar system. The assessment specifically asks students to discuss the relationship between mass and gravitational interaction by studying the orbit of the planets in the solar system and highlights the need for evidence from experiments or real-world examples to support claims. In addition, the item also posts three questions and asks students to address them in the essay, which is very closely related to the task. The five evaluation criteria cover the evidentiary requirement for argumentation and the scientific rigorousness of the claims.
Automatic grading of student-written responses. ChatGPT can assist in evaluating and reporting student performance under the framework and the NGSS. It can be used to analyze student responses to assessments and provide feedback on their understanding of the science concepts they are learning, as well as identifying areas of strength and weakness. Furthermore, ChatGPT can be used to generate reports on student performance, which can be shared with parents, administrators, and other stakeholders.
To test the automatic grading function, I sent this written response to ChatGPT: "[U]sing the criteria to rate this essay: Solar system includes the sun, earth, and other planets. Sun revolves around the earth, but the other planets revolve around Sun. This is because I saw the sun rise in the morning and set at night. However, the other planets are not. There must be forces between sun and the earth so that the sun can revolve around the earth instead of leaving the earth. This force should be fixed no matter the mass of earth and the sun. There is no evidence that the force would be associated with their distance." As an expert in science education, I would agree the grading is fairly accurate as the writing does not meet the performance expectation (see Figure 2). Moreover, ChatGPT pointed out the major errors in the writing and criticized the use of evidence and the overall coherence of the argument made. The response is useful feedback to students on their learning toward the NGSS performance expectation.
Automatically generated personalized learning guidance. Another way ChatGPT can help is providing students with personalized learning guidance. By providing students with targeted learning and resources, ChatGPT can guide students in self-directed learning to develop the necessary knowledge and skills to meet the performance expectation of the NGSS. To test this capacity, we developed the prompt, "Please give learning guidance to help me improve my science learning to meet the performance expectation: Construct and present arguments using evidence to support the claim that gravitational interactions are attractive and depend on the masses of interacting objects." The system automatically generated six learning guidance, covering the understanding of fundamental principles of gravity, the solar system, and the motion of planets, conducting research and experiments, practicing writing and presenting, seeking feedback, and staying curious. These points are critical for the student to improve learning (see Figure 3).
Further, I asked ChatGPT to provide learning materials, which resulted in six learning materials covering different key knowledge points that the student needs (see Figure 4).
Customize recommendations for special needs. It is critical the recommendations can meet the needs of every student, especially those with learning disabilities, to ensure equitable and inclusive learning. Thus, I further told ChatGPT that I have dyslexia and asked for help with learning materials. Very nicely, ChatGPT recommended an audiobook, Khan Academy for video lectures, two interactive simulations, and games. It also suggested I should use assistive technology, such as text-to-speech software and tutoring or support groups. These recommendations are beneficial for students with special learning needs (see Figure 5).
In conclusion, ChatGPT has the potential to significantly aid educators and school systems in addressing the challenges related to assessment practices in the implementation of the framework and the NGSS. By automatically generating assessments, grading students' written constructed responses or essays, and providing learning guidance and learning materials, ChatGPT can save educators time and effort. ChatGPT evidenced the potential of AI in improving the efficiency of assessment development, providing customized learning support, and assisting in evaluating and reporting student performance [10]. With all these significant potentials, ChatGPT has limitations like other AI applications [11]. Caution needs to be taken.
First, ChatGPT cannot be a substitute for teachers. ChatGPT has demonstrated impressive capabilities in natural language processing tasks, such as automatic assessment development, automatic grading, automatic guidance, and automatic recommendations for science learning. However, it is important to recognize that ChatGPT is not capable of substituting the role of a teacher. While ChatGPT can provide targeted and relevant information, it lacks the ability to provide emotional support and facilitate critical thinking and problem-solving skills in science learning. Additionally, the interactive nature of science teaching requires a human touch that cannot be replicated by a machine. Therefore, it is important to view ChatGPT and other AI models as tools to enhance and supplement the work of teachers rather than as a replacement for them.
Second, teachers need professional knowledge to use ChatGPT for instructional purposes. The use of ChatGPT and other AI models in education has the potential to enhance and supplement instructional practices. However, for teachers to effectively utilize ChatGPT in the classroom, they must possess the necessary professional knowledge. This includes understanding the capabilities and limitations of the technology, as well as the appropriate pedagogical practices for integrating it into instruction. Teachers must also be able to evaluate the quality and relevance of the information provided by ChatGPT and make informed decisions about its use in the classroom. Furthermore, teachers need to know how to use the technology in a way that aligns with their curriculum and learning objectives [11]. Without proper professional knowledge, teachers may not be able to fully realize the potential of ChatGPT and other AI models in the classroom. Thus, teacher education programs should provide opportunities for teachers to develop the necessary knowledge and skills to use ChatGPT and other AI models for instructional purposes.
Third, the "black box" of ChatGPT, in terms of how it generates automatic results, needs to be explainable so students, teachers, and parents can fully appreciate its merit. ChatGPT, like many AI models, is a black box in terms of how it generates its automatic results. This means while the model can produce accurate and relevant information, it is difficult to understand how it arrived at its conclusions [12]. This can be a limitation when using ChatGPT for instructional purposes, as students, teachers, and parents may be less inclined to trust or fully appreciate the merit of the model's output without a clear understanding of its inner workings. To overcome this limitation, it is important for researchers and developers to make the model more "explainable" by providing insight into the process and reasoning behind its decisions. This can be done through techniques, such as model interpretability, feature attribution, and visualization. By providing greater transparency into how ChatGPT generates its results, students, teachers, and parents can better understand the model's capabilities and limitations, which can increase trust and confidence in its use for instructional purposes. Therefore, explainable AI models are crucial to have a better understanding and trust in the results generated by ChatGPT.
Last, teachers have to pay attention to the potentially biased results that may be generated from ChatGPT to ensure equitable and inclusive learning. This essay does not examine the potential bias in the several automation procedures of science learning. However, as ChatGPT is trained on large amounts of data, it can potentially reproduce and amplify the biases present in the data it was trained on [13]. This can be a concern when using ChatGPT for instructional purposes, as biased results can lead to inequitable and exclusive learning experiences. For example, if the model is trained on data that is not representative of diverse perspectives, it may generate results that are not inclusive or culturally sensitive. To ensure equitable and inclusive learning, teachers have to pay attention to the potential for bias in the results generated by ChatGPT. This includes being aware of the limitations of the data used to train the model and monitoring the results for potential bias. Additionally, teachers should have the capability to evaluate the quality and relevance of the information provided by ChatGPT and make informed decisions about its use in the classroom. Furthermore, teachers can use the results provided by ChatGPT as a starting point, and not as the final results, and add their own knowledge and understanding to make the result more inclusive. Moreover, it is crucial to have a diverse and inclusive dataset during the training of ChatGPT to ensure the model generates results that are fair and unbiased. Additionally, it is important to regularly evaluate and update the data used to train the model to ensure it remains inclusive and culturally sensitive. In summary, while ChatGPT has the potential to enhance and supplement instructional practices, teachers have to pay attention to the potential biases that may be generated by the model to ensure equitable and inclusive learning.
[1] National Research Council. A Framework for K-12 Science Education: Practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, D.C., 2012.
[2] National Research Council. Next Generation Science Standards: For states, by states. The National Academies Press, Washington, D.C., 2013.
[3] Zhai, X., Yin, Y., et al. Applying machine learning in science assessment: A systematic review. Studies in Science Education 56, 1 (2020), 111–151.
[4] Zhai, X., Haudek, K.C., et al. From substitution to redefinition: A framework of machine learning-based science assessment. Journal of Research in Science Teaching 57, 9 (2020), 1430–1459.
[5] Maestrales, S., Zhai X., et al. Using machine learning to score multi-dimensional assessments of chemistry and physics. Journal of Science Education and Technology 30, 2 (2021), 239–254.
[6] Zhai, X., He, P., et al. Applying machine learning to automatically assess scientific models. Journal of Research in Science Teaching 59, 10 (2022), 1765–1794.
[7] Zhai, X., Haudek, K., et al. Assessing argumentation using machine learning and cognitive diagnostic modeling. Research in Science Education 53, 2 (2023).
[8] Assaraf, N. OpenAI's ChatGPT: Optimizing language models for dialogue. cloudHQ Blog. December 8, 2022; https://blog.cloudhq.net/openais-chatgpt-optimizing-language-models-for-dialogue
[9] Zhai, X. ChatGPT user experience: Implications for education. SSRN. December 27, 2022; http://dx.doi.org/10.2139/ssrn.4312418
[10] Zhai, X. Practices and theories: How can machine learning assist in innovative assessment practices in science education. Journal of Science Education and Technology 30, 2 (2021), 139–149.
[11] Zhai, X., Krajcik, J., et al. On the validity of machine learning-based Next Generation Science Assessments: A validity inferential network. Journal of Science Education and Technology 30, 2 (2021), 298–312.
[12] Zhai, X., Shi, L., et al. A meta-analysis of machine learning-based science assessments: Factors impacting machine-human score agreements. Journal of Science Education and Technology 30, 3 (2021), 361–379.
[13] Zhai, X. and Krajcik, J. Pseudo AI bias. arXiv preprint arXiv:2210.08141. 2022; https://arxiv.org/abs/2210.08141
Xiaoming Zhai is an assistant professor of science education and artificial intelligence, and the director of AI4STEM Education Center at the University of Georgia. His research focuses on applying AI in facilitating science teaching and learning.
Figure 1. Automatic assessment development via ChatGPT.
Figure 2. Automatic grading and feedback via ChatGPT.
Figure 3. Automatic learning guidance via ChatGPT.
Figure 4. Automatic recommendation of learning materials via ChatGPT.
Figure 5. Customized automatic recommendation for learning materials to meet special needs via ChatGPT.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.