LLMs Accurately Predict Educational and Psychological Outcomes from Childhood Essays - Articles of Education
News Update
Loading...

Thursday, August 7, 2025

LLMs Accurately Predict Educational and Psychological Outcomes from Childhood Essays

Featured Image

The Rise of Large Language Models and Their Impact on Predictive Analysis

Large language models (LLMs), which are advanced artificial intelligence systems designed to analyze and generate text in various human languages, have become increasingly prevalent over the past few years. These models have gained widespread attention since the launch of ChatGPT, which utilizes different versions of an LLM known as GPT. As a result, these AI tools have been adopted by individuals globally and have also found their way into professional and research environments.

Tobias Wolfram, a researcher with a Ph.D. in Sociogenomics from Bielefeld University, recently conducted a study focused on evaluating how effectively LLMs can predict people's educational and psychological outcomes by analyzing essays written during childhood. His findings, published in Communications Psychology, indicate that certain computational models can predict these outcomes with accuracy comparable to teacher assessments and significantly better than genetic data.

Wolfram shared his insights with Articles of Education, explaining that during his undergraduate studies, he was drawn to data that deviated from standard survey questions commonly used in social and behavioral sciences. He engaged in network analyses, web data scraping, and eventually delved into natural language processing. However, he noted the limitations of the tools available at the time, which were far from what is now possible with modern LLMs.

In 2020, when Wolfram began his Ph.D. in Sociogenomics, LLMs had only recently emerged following the public release of GPT2 and GPT3. Around the same time, he discovered a dataset containing extensive educational and psychological information for a large group of individuals born in the 1950s. This dataset included essays written by participants at age 11, which had just been digitized.

"Finding these essays was a unique opportunity," said Wolfram. "Reading them revealed a wide range of complexity, length, and grammatical accuracy. To a human eye, it was immediately obvious, but how well could we quantify this? And what does it mean for life outcomes?"

With support from his advisor and colleagues, Wolfram embarked on a study to explore the potential of LLMs in analyzing these essays. He used a model similar to those behind tools like ChatGPT to convert each essay into a complex numerical profile known as a 'text embedding.' This profile captured the meaning and style of the essays across over 1,500 dimensions. In addition, he extracted over 500 other metrics, such as lexical diversity, sentence complexity, readability, and the number of grammatical errors.

After extracting this data, Wolfram trained a machine learning model to make predictions based on the extracted features. For this purpose, he employed an ensemble machine learning model called a "SuperLearner." This model combines predictions from multiple algorithms, such as Random Forest, Neural Networks, and Support Vector Machines, to produce the most accurate final prediction possible. To evaluate the model's performance, he used 10-fold cross-validation, training the model on one part of the data and testing it on another part it had not seen before.

To assess the predictive power of the models, Wolfram primarily relied on a metric known as "predictive holdout R2." This measure quantifies how much of the variation in an outcome, such as cognitive ability or education, a model can explain in new data compared to simply guessing an average value. A score of 0.6, for example, would indicate that the model could explain 60% of the variance.

The results of the study suggest that LLMs and other advanced machine learning models have significant potential for making accurate predictions based on textual data. Additionally, they highlight the value of rich texts, such as essays and personal writings, which can provide important insights about the person who wrote them.

Wolfram emphasized that the project took nearly five years to be published, despite the relatively straightforward nature of the main analyses. While his Ph.D. focused on topics at the intersection of social stratification, differential psychology, and genomics, he has since left academia and may not have the opportunity to follow up on this work. However, he believes that future studies using more recent computational models could yield even better predictions.

Notably, at the time of his study, LLMs and other machine learning models were not as advanced as they are today. With the rapid development of these models, similar studies employing newer technologies could potentially achieve even greater accuracy.

Wolfram also pointed out that the approach used in his paper was based on traditional machine learning methods, where models are trained on a set of examples and then validated on unseen data. Today, it would be common to prompt an LLM using a chat interface without providing any training data at all. He suggested that such an approach might outperform the results of his study, highlighting the swift pace of progress in the field.

This article was written by Ingrid Fadelli, edited by Gaby Clark, and fact-checked and reviewed by Robert Egan. It reflects the careful work of dedicated professionals. We rely on readers like you to sustain independent science journalism. If this reporting matters to you, please consider a donation, especially a monthly one.

Share with your friends

Give us your opinion
Notification
This is just an example, you can fill it later with your own note.
Done