TY - GEN
T1 - A Systematic Evaluation of Code-generating Chatbots for Use in Undergraduate Computer Science Education
AU - Torek, Adam
AU - Sorensen, Elijah
AU - Hahle, Natalie
AU - Kennington, Casey
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This research paper focuses on evaluating code-generating chatbots. Chatbots like ChatGPT released in the past three years have proven capable of a wide variety of tasks within a conversational interaction, including writing code and answering code-related questions. With these recent advances, chatbots have many potential uses in education, including computer science education. However, before these chatbots are used in CS curricula, their capabilities and limitations must be systematically tested and understood. In this work, we evaluate the capabilities and limitations of four known, open-source, code-based chatbots in programming tasks by performing a standardized study in which different chatbots are tasked with providing answers for a variety of assignments from Boise State University's computer science program. We found that while all of the chatbots can write code and provide explanations, some do better than others, and each of them work differently in conversations. Moreover, all of them suffered similar and important limitations, which has implications for adoption in curriculum. As a second experiment, we used the Llama chatbot to perform a human evaluation by enabling student novice and experienced programmers to use it as a coding assistant to complete specific tasks in a common software development environment. We found that the coding assistant can help novice programmers accomplish simple tasks in comparable time and code efficacy as more experienced programmers. Given these experiments, and given feedback from participants in our studies, we see a clear picture emerge: new programmers should learn important concepts about programming without the help of code assistants so students can (1) demonstrate their understanding of important concepts and (2) have enough experience to assess code assistant output as useful or erroneous. Then, once intermediate skills are mastered (e.g., object oriented programming and data structures), it seems appropriate to introduce students systematically to coding assistants to help with specific assignments throughout the undergraduate computer science curriculum. We conclude by addressing ethical considerations for the use of code-based chatbots in computer science education and future directions of research.
AB - This research paper focuses on evaluating code-generating chatbots. Chatbots like ChatGPT released in the past three years have proven capable of a wide variety of tasks within a conversational interaction, including writing code and answering code-related questions. With these recent advances, chatbots have many potential uses in education, including computer science education. However, before these chatbots are used in CS curricula, their capabilities and limitations must be systematically tested and understood. In this work, we evaluate the capabilities and limitations of four known, open-source, code-based chatbots in programming tasks by performing a standardized study in which different chatbots are tasked with providing answers for a variety of assignments from Boise State University's computer science program. We found that while all of the chatbots can write code and provide explanations, some do better than others, and each of them work differently in conversations. Moreover, all of them suffered similar and important limitations, which has implications for adoption in curriculum. As a second experiment, we used the Llama chatbot to perform a human evaluation by enabling student novice and experienced programmers to use it as a coding assistant to complete specific tasks in a common software development environment. We found that the coding assistant can help novice programmers accomplish simple tasks in comparable time and code efficacy as more experienced programmers. Given these experiments, and given feedback from participants in our studies, we see a clear picture emerge: new programmers should learn important concepts about programming without the help of code assistants so students can (1) demonstrate their understanding of important concepts and (2) have enough experience to assess code assistant output as useful or erroneous. Then, once intermediate skills are mastered (e.g., object oriented programming and data structures), it seems appropriate to introduce students systematically to coding assistants to help with specific assignments throughout the undergraduate computer science curriculum. We conclude by addressing ethical considerations for the use of code-based chatbots in computer science education and future directions of research.
KW - chatbot
KW - code copilot
KW - large language models
UR - http://www.scopus.com/inward/record.url?scp=105000814686&partnerID=8YFLogxK
U2 - 10.1109/FIE61694.2024.10893165
DO - 10.1109/FIE61694.2024.10893165
M3 - Conference contribution
AN - SCOPUS:105000814686
T3 - Proceedings - Frontiers in Education Conference, FIE
BT - 2024 IEEE Frontiers in Education Conference, FIE 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 54th IEEE Frontiers in Education Conference, FIE 2024
Y2 - 13 October 2024 through 16 October 2024
ER -