From Coverage to Reasoning: Rebuilding ESS314 with Agents
A Warning Back in 2025
We were told a year ago: all course materials must be ADA compliant by April 2026. The requirement is reasonable as accessibility is not optional but the scope of what it implied for any course, including our UW ESS314 was not small. Of course, I postponed this to the last minute.
The course existed as thirty slide decks, co-designed with Harold Tobin over several years. They were built from screenshots of textbooks that are not open-access, research figures from scientific literature with missing captions, and equations typeset inconsistently across lectures delivered by different hands over different years. Adding alt-text one figure at a time was not a solution. The figures were inaccessible in a more fundamental sense: screenshots of copyrighted material, uncaptioned, unlinked to sources. Fixing them properly meant replacing them.
I had been updating python labs from proprietary software but had postpone any other rethinking about the course. The compliance deadline forced a decision.
The Architect Under Three Pressures
ESS314 sits at a transition point in the curriculum, between foundational science and upper-division specialties, and attempts to cover three distinct pillars in ten weeks: geophysics for geodynamics, geophysical hazard characterization, and subsurface imaging. Each pillar could anchor an entire course. Together, they produce a course that is either genuinely synthetic or, depending on your view, covering too much ground at insufficient depth.
The textbook we use is not open-access, though an older edition is available through the UW library’s digital reading room at no cost to students. Over the years, the slides had grown increasingly self-contained to reduce reliance on the book. In doing so, I had inadvertently built a course that no longer clearly needed a textbook, while remaining in a format — individual PDF slide decks — that was neither accessible, version-controlled, nor easily updated.
There was a quieter problem I had been avoiding. Standards drift. Most faculty would acknowledge this honestly: across the pandemic years and those following, mathematical rigor in our courses softened. Notation became inconsistent across lectures. Derivations that had once been worked through carefully became gestures. I had noticed this in my own slides without doing anything about it.
Three simultaneous pressures — accessibility compliance, pedagogical coherence, and updating the course for a labor market that has materially changed since it was designed — meant rebuilding, not patching.
Two Platforms for the Best of both Worlds
The rebuilding happened across two platforms, each doing something distinct.
Claude.ai for content architecture. I built a Claude project with a knowledge base containing the syllabus as a Word document, the full set of PDF slide decks, and a set of standing instructions I had iteratively refined over several sessions. Those instructions encode the pedagogical constraints: preserve the physical reasoning exactly, harmonize mathematical notation to a consistent standard across all lectures, flag every derivation step that was present in the original but missing a logical bridge, and never silently drop content. The knowledge base gave Claude access to the whole course at once — not one lecture in isolation, but the entire arc — which is what made the structural problems visible.
GitHub Copilot for engineering. Converting a course into a JupyterBook is a software engineering problem: directory structure, _toc.yml configuration, docker containerization, cross-reference syntax, pre-commit hooks, CI/CD for automated HTML builds. Copilot works natively inside VS Code and handles this in a way that feels different from asking a chat interface: it understands the repository context, follows the file structure already in place, and applies software engineering conventions I do not have to specify from scratch. I prompted it to set up the JupyterBook scaffolding, configure the build pipeline, and enforce markdown-to-HTML conversion as a pre-commit hook. A single prompt. Best practices applied, not specified.
The division of labor is part legacy, part deliberate. Claude holds the content and the pedagogical specification, claude code can also be integrated. Copilot handles the engineering and the repository conventions, has generous educational subscription fees and I have more tokens and fewer limitations with copilot. Together, they cover the build.
What the Conversion Surfaced
Two things emerged during the Claude-assisted conversion that I had not anticipated.
First, the pedagogical arc was less coherent than I had believed. Reading lectures as sequential structured text — rather than as visual decks to be presented in person — made logical gaps visible. A concept introduced in lecture three assumed notation not established until lecture seven. Seismological lectures did not have the same conventions for Vp and Vs. There was not explicit statement or bridge between our goals to understand geodynamics, hazard, and resources; leaving students without explicetly knowing why we are doing this. The agent surfaced these inconsistencies mechanically. Recognizing them as pedagogical problems, and deciding what to do about them, required human judgment. But the format change made the diagnosis possible.
Second, harmonizing notation across thirty lectures was a single structured prompt. Rebuilding the bridge between pillars took an afternoon. Neither of these would have been fixable at all while working inside individual slide decks.
Geophysical Reasoning
While the conversion was in progress, I ran a separate experiment. Without giving Claude any context about ESS314, I asked it to analyze job market expectations for geoscience graduates in 2026: what skills employers were prioritizing, what salary ranges the field was producing at the BS versus graduate level, what a ten-week survey geophysics course should cover to serve both a grad-school pipeline and a direct-to-career path from a BS.
The resulting syllabus matched our existing one reasonably well. The core topics are not going anywhere. But one recommendation from that analysis has stayed with me. When I described the course framing — survey of geophysics, a format that has organized these courses for decades — Claude suggested a different organizing principle: geophysical reasoning. Not what methods exist, but how a geophysicist thinks.
The distinction struck me as precisely right for 2026.
When the internet arrived, Wikipedia commoditized knowledge. A student who had memorized the depth of Moho difference between continental or oceanic lithosphere held no particular advantage over one who could look it up in thirty seconds. Knowledge access collapsed as a differentiator. What remained scarce, and thus more valuable, was the capacity to reason with knowledge: to ask the right question, to know which method fits which problem, to read a gravity anomaly and form a defensible interpretation.
We are watching the same transition happen again, one layer up. Agentic AI pipelines built on large language models are commoditizing implementation and workflows. A student who can write a seismic inversion from scratch holds less advantage over one who can just prompt it than they did five years ago. What remains valuable is reasoning: setting up the problem correctly, evaluating whether the output is physically plausible, recognizing when a model is confidently wrong. Geophysical reasoning, not a survey of geophysics, is what this course needs to build. I also told the UW ESS 314 students on day one: AI can take your job if you do not understand fundamental concepts and think critically about the information in front of you. Not just a rhetorical warning, it is now a course design principle.
That reframe also produced the one structural gap the market analysis identified: skills for students entering the workforce or grad school directly from a BS — technical communication for specialist and non-specialist audiences, fluency processing and interpreting a dataset, AI literacy, navigating an open problem — had been underdeveloped. Our course had a dead hour mid-week, historically used as unofficial office hours. On Claude’s recommendation, I converted it into ten structured discussion sections: reading primary literature and connecting it to a societal question, open problem framing, peer mentoring for career development, and explicit AI literacy — how to use these tools in scientific workflows, and what they cannot do yet.
The Living Course
The JupyterBook is publicly available, zero cost, ADA compliant by construction, and version-controlled. I can update it in five minutes standing next to a student who asks a question I had not addressed. Here are a few noticable examples. Once, a student asked about setting up GitHub Copilot for the lab assignments. I opened VS Code, prompted to add Vscode/Github instructions to the book, and my agent made changes, add-commit-push, the page was published in the book a few minutes after with that single prompt.
The second time was less routine. I was standing with my laptop, telling Copilot to handle the markdown-to-HTML conversion as a pre-commit hook. After it ran, I added a line to the standing instructions: always configure this hook when setting up a new notebook repository, without being asked and add this to the agent skill. The agent updated its own specification. The next repository I initialized, it was already there.
That moment sat with me for a while. There will be a version of this workflow — not far from where we are now — where I do not need to type that instruction, or stand there, or hold the laptop. The agent will have learned the pattern from the context and stored in memory. The update will propagate automatically across every relevant downstream task. Maybe one day, I will just talk to my agent without touching my keyboard…
Students are laptop-free during lectures. All derivations are in the book; they can return to them any time. Class is about the reasoning, not the transcription. They show up. More students ask questions. I have time to look at each of one in the eyes. It is too early to know whether this translates into deeper learning at scale, but the engagement is qualitatively different.
What the Forced Deadline Revealed
Accessibility compliance was a curriculum forcing function. The mandate to go properly formatted surfaced inefficiencies that had been invisible inside slide decks — notation drift, scaffolding gaps, a syllabus designed for one student pathway and never explicitly updated for another. These problems were not new. They were invisible until the format change made them legible.
This may generalize. Compliance constraints, open-access aspirations, licensing pressures — these do not simply add overhead. They change the representation of course material in ways that make structural problems visible and fixable. The agent identified the arc problem not by analyzing pedagogy, but by converting thirty lectures into sequential structured text, which made gaps obvious that the slide format had hidden for years.
I did not rebuild ESS314 because I had a pedagogical vision waiting to be implemented. I rebuilt it because an compliance deadline made the existing design untenable. The rebuilt version is better.
An Open Question
The JupyterBook is now openly forkable. Another instructor could take it, adapt it, and have a working course in a week. That is the intended consequence of open licensing.
But the version they fork is a first-pass reconstruction, with untested discussion sections, a market analysis I have not yet validated against student outcomes, and a “reasoning over survey” framework that has been in the room for one quarter. What obligation comes with making a half-finished redesign publicly available?
Textbooks are published after years of iteration. Course materials have historically been invisible. A living JupyterBook occupies an intermediate category: faster than peer review, more structured than informal sharing, without the validation loop that produces reliable knowledge.
And then there is the deeper question that academic and teachers may ask. The agent that updated its own standing instructions based on a pattern I demonstrated once — at what point does the course itself become the agent’s artifact as much as mine? I set the pedagogical values. I control the original course and lecture content. I review the output. But the iteration is increasingly running without my hands on it. I am still working out what that means for authorship, for accountability. With full honesty and humility, I tell the student how the book is made and take accountability of AI & I’s mistakes.