Forecasting Student Achievement in MOOCs with Natural Language Processing

Type: Evidence | Proposition: B: Teaching | Polarity: | Sector: | Country:

Student intention and motivation are among the strongest predictors of persistence and completion in Massive Open Online Courses (MOOCs), but these factors are typically measured through fixed-response items that constrain student expression. We use natural language processing techniques to evaluate whether text analysis of open responses questions about motivation and utility value can offer additional capacity to predict persistence and completion over and above information obtained from fixed-response items. Compared to simple benchmarks based on demographics and dictionary-­based language analyses, we find that a machine learning prediction model can learn from unstructured text to predict which students will complete an online course. We show that the model performs well out­-of-­sample within a single course, and out­-of-­context in a related course, though not out-of-subject in an unrelated course. These results demonstrate the potential for natural language processing to contribute to predicting student success in MOOCs and other forms of open online learning.

Citation: Carly Robinson, Michael Yeomans, Justin Reich, Chris Hulleman and Hunter Gehlbach (2016). "Forecasting Student Achievement in MOOCs with Natural Language Processing". In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (LAK '16). ACM, New York.