When it comes to AI in math instruction, I'm struck by a glaring paradox. On one hand, AI makes groundbreaking strides in mathematical research and complex computations. On the other, researchers, educators, and others deride AI as being "bad" at math. Consequently, reports of its mathematical abilities often seem strikingly contradictory. This dichotomy raises an important question: What exactly is AI capable of in mathematics, and how can we leverage its strengths while mitigating its weaknesses?
This contradiction isn't merely anecdotal. A 2022 study by Drori et al., published in PLOS ONE, tested GPT-3 on a dataset of 1,000 mathematical problems drawn from MIT's 18.01 Single Variable Calculus course. The results were sobering: GPT-3 achieved only an 18% accuracy rate on these college-level problems. A 2023 article in the Wall Street Journal, citing researchers at Stanford University and the University of California, reported that AI not only struggles with math but is "getting dumber" at it. A subsequent 2024 WSJ article reported that the highly-touted Khanmigo AI assistant for students can struggle with even basic math.
Conversely, only a month ago, the New York Times reported that Google's AlphaGeometry AI solved “geometry olympiad” problems at a level comparable to the most mathematically attuned high-school students in the world. Some of the problems are so complicated, reports Scientific American, that even experts cannot solve them. Yet AlphaGeometry solved the olympiad problems at a success rate similar to that of human gold medalists. Furthermore, reports have circulated that OpenAI has developed a new LLM called Q* that excels at performing grade-level math — but OpenAI has not commented on the reports.
Unraveling the Paradox
To me, these are the critical parts to unraveling this paradox:
General AI Chatbots and Mathematical Limitations
General AI chatbots like ChatGPT weren't built for math. Common Large Language Models (LLMs) can do some math, but "not reliably," according to a report from the MIT Technology Review. This is because general AI LLMs don't have the appropriate algorithms "or even the right architecture" to solve math problems reliably.
A fundamental issue is that while AI can solve complex problems, it doesn't truly "understand" mathematics as humans can. LLMs operate based on pattern recognition and statistical correlations, not fundamental mathematical reasoning.
The GPT-3.5 vs. GPT-4o Divide
A second part of the apparent contradiction regarding AI's abilities in math stems from the rapid evolution of AI models, particularly the shift from GPT-3.5 to GPT-4o (Omni). GPT-3.5 often struggles with consistent mathematical accuracy. It can make basic arithmetic errors, misinterpret word problems, and provide inconsistent explanations.
In contrast, GPT-4o, released in May 2024, represents a substantial leap forward. It demonstrates much higher accuracy in calculations, improved problem-solving capabilities, and more consistent mathematical reasoning.
The immediate issue for schools is that most AI in education tools use GPT-3.5. (Many schools ban ChatGPT use by students, hence GPT-4o.) Use of the inferior GPT-3.5 LLM helps perpetuate the belief among some educators that AI is bad at math.
Many reviews of AI's mathematical abilities stem from before the release of GPT-4o. As such, they strike me as largely inconsequential and often misleading. Even the criticism of AI abilities in the aforementioned MIT report should be taken with a grain of salt; it was published in November 2023.
GPT-4o's Varying Mathematical Proficiency
GPT-4o demonstrates varying levels of proficiency across different areas of mathematics. One way of understanding this is to consider GPT-4o abilities across a spectrum of math abilities:
“Fantastic, complete, current and comprehensive AI resource for K-12 teachers”
“A refreshing and inspiring take on incorporating AI into teaching and learning. Our school is ordering copies of this book for our teachers. Full of common-sense strategies that are easy to implement, using current and emerging AI tools. The author is a seasoned educator and has done a nice job of organizing the different uses of AI in the classroom by discipline, which our teachers found appealing. Also, the strategies he writes about are not just good teaching with AI, but good teaching in general. We highly recommend this book.” (Amazon review)
My new book AI Tools & Uses: A Practical Guide For Teachers is now available at Amazon! It is the first of its kind to be organized by K-12 subject areas and teaching responsibilities, and it offers unprecedented depth and specificity for teachers. The book provides detailed analyses of AI tools, including both subject-specific tools and versatile platforms that can be leveraged across disciplines.
AI Math Ability Spectrum
Calculation and Computation:
Strength: There appears to be broad agreement that GPT-4o has strong computational abilities, particularly in solving numerical calculations and performing complex mathematical operations quickly. It excels in basic to intermediate arithmetic, algebraic manipulations, and some calculus-based computations.
Weakness: GPT-4o can occasionally make simple calculation errors, especially when interpreting problems involving more nuanced arithmetic. One issue is that it doesn’t possess an inherent number sense to recognize when a calculation makes sense.
Problem-Solving:
Strength: GPT-4o demonstrates strong capabilities in solving multi-step problems and can explain its reasoning clearly.
Weakness: GPT-4o may struggle with problems that require abstract reasoning, creative problem-solving, or an innovative approach outside standard methodologies.
Conceptual Understanding:
Strength: Advanced AI can explain mathematical concepts in multiple ways, adapting to different learning styles.
Weakness: GPT-4o sometimes falls short in conveying deep conceptual understanding. It may focus more on procedural accuracy rather than exploring the underlying principles or offering insight into the "why" behind a mathematical concept.
Mathematical Reasoning:
Strength: GPT-4o can construct logical arguments and proofs, showing improvement in mathematical reasoning.
Weakness: GPT-4o can sometimes use flawed logic or make unjustified leaps in reasoning, especially in more abstract mathematical domains.
Visual and Spatial Reasoning:
Strength: Some AI models can interpret and generate visual representations of mathematical concepts.
Weakness: It struggles with sophisticated spatial reasoning or interpreting spatial relationships in non-standard ways. Its understanding is limited to what can be described textually or through simple diagrams.
Specialized AI Models for Mathematics
Finally, Google's AlphaGeometry is a specialized AI model designed and trained specifically for solving complex geometry problems. Its years of training involved extensive exposure to high-level mathematical reasoning, proof strategies, and problem-solving techniques found in Math Olympiads. This specialized focus – unlike those of common LLMs – allows it to excel in a very narrow domain, much like a human expert who has spent years honing their skills in a very specific area.
The takeaway?
AI can indeed excel at high-level math. But not the AI that you and your students are currently using.
Implications for Secondary School Mathematics
Here are some implications for commonly taught secondary-level math subjects: Pre-Algebra, Algebra, Geometry, Pre-Calculus, and Calculus.
GPT-4o has strong computational abilities and is generally effective in areas like Pre-Algebra, and Algebra, where precise calculation and computation are crucial. But, earlier models like GPT-3.5 often make simple calculation errors, struggling even with basic arithmetic. So, avoid using GPT-3.5.
GPT-4o can solve multi-step problems, which makes it helpful in subjects like Algebra and Geometry, where step-by-step problem-solving is crucial. However, its performance may falter in more abstract areas like Pre-Calculus and Calculus, where creative or non-standard solutions might be required.
GPT-4o can explain mathematical concepts in various ways, which can be tailored to different learning preferences. It often excels in providing clear, step-by-step procedural explanations. As such, AI can serve as a proficient math tutor. (Not surprisingly, many AI tools purport to do just that.) Yet, while GPT-4o can be a valuable tool for explaining procedures in Pre-Algebra and Algebra, it might be less effective in advanced Geometry and Calculus, where deep conceptual understanding is necessary.
Mathematical reasoning is a GPT-4o weakness. GPT-4o is likely to perform well in Algebra and Geometry, where logical progression is key, but it may struggle in Pre-Calculus and Calculus, if tasks require rigorous proof construction and abstract reasoning.
GPT-4o also struggles with spatial reasoning or tasks such as interpreting spatial relationships in non-standard ways. GPT-4o's limitations will likely be more apparent in advanced Geometry and Calculus, where visual and spatial reasoning is critical.
In all, Pre-Algebra and Algebra teachers can feel relatively confident introducing GPT-4o into their instruction with students. Introductory Geometry teachers should also find GPT-4o generally beneficial. That said, advanced Geometry, Pre-Calculus, and Calculus teachers should proceed with caution, especially when it comes to tasks requiring high levels of abstraction, creativity, or visual-spatial insight.
Whatever the subject, all signs point to a need for a nuanced approach to integrating AI in math education.
Approaches to Using Top AI Tools in Math Instruction
Drawing from my book chapter on approaches to using AI tools in math instruction, here are some strategies for effective integration:
Identify Targets of Difficulty: To identify the best use, focus on mathematical concepts that are challenging to teach, crucial for students to understand, and where AI can provide valuable support.
Generate Personalized Content: Use AI to create customized explanations, examples, and practice problems based on individual student needs.
Explore Advanced Concepts: Leverage powerful computational tools like Wolfram Alpha in conjunction with AI to help students visualize and understand complex mathematical ideas. Use tools like MathGPT Pro and Wolfram GPT to explore mathematical concepts, ask questions, and seek explanations.
Facilitate Inquiry-Based Learning: Encourage students to use an advanced LLM to explore mathematical concepts, ask questions, and seek explanations.
Promote Exploration and Discovery: Incorporate dynamic, interactive tools like Desmos alongside AI to help students develop intuitive understanding of mathematical relationships, by manipulating graphs, equations, and parameters.
AI Beyond Procedural Math
Math specialist Dan Meyer raises important concerns about GPT-4o in math instruction, particularly about its focus on procedural steps rather than conceptual understanding. That said, there do exist opportunities to leverage AI for conceptual understanding:
Multiple Representations: Advanced AI like GPT-4o can present mathematical concepts in various ways, catering to different student needs.
Scaffolded Exploration: AI can guide students through a process of discovery, asking questions that lead to conceptual insights, rather than just providing step-by-step solutions.
Real-World Connections: AI can generate relevant, real-world applications of mathematical concepts, helping students see the broader significance of what they're learning.
Specialized AI for Mathematics
The success of Google's AlphaGeometry, and the rapid progression of AI abilities, indicates that specialized AI models for mathematics will likely become available in the near future. When, and at what cost, I can only venture a guess.
At the moment, GPT-4o is not allowed in the hands of high school students and GPT-5o is set to be released soon, with potential improvements in math abilities. A “mini” GPT-4 level LLM is being made to companies, raising the hope of higher performing AI in education tools.
The AI math paradox clearly presents both opportunities and challenges. The key I believe lies in understanding the spectrum of LLM abilities and the significant differences in ability between GPT-3.5 and GPT-4o. As educators, we must stay informed about these developments and critically evaluate AI tools for use in our classrooms, always remembering that LLMs differ significantly in power and accuracy.
AI Tools and News:
Bytelearn - AI “co-teacher” for grades 6-12
KardsAI - AI flashcard maker
PopAI - your personal AI workspace
New Amplify Desmos Math curriculum
TEACHERS ON AI - Poster from MIT Teaching Systems Lab
Understanding How AI Works Makes It More Effective in Lesson Planning - Edutopia
Researchers combat AI hallucinations in math - The Hechinger Report
California passes bill to limit student cell phone use on K-12 campuses - EdSource
Which States Ban or Restrict Cellphones in Schools? - EdWeek
Responsible AI Adoption - TCEA
What do schools need to know about AI paraphrasing detection tools? - K12dive
MagicSchool’s newest tool, "Presentation Generator"