The Application of Machine Learning in Analyzing Organic Compounds from NMR Spectral Data

Nicole Powell

Name: Nicole Powell
Major: Computer Science
Minor: Math and Chemistry
Advisors: Dr. Sofia Visa and Dr. Heather Guarnera
Nuclear magnetic resonance (NMR) is used in organic chemistry to identify unknown organic compounds. The data obtained from an NMR spectrometer are typically shown in the form of a spectrum, which is then analyzed by an analytical chemist. The action of analyzing a spectrum, especially one of a large and complex molecule, is a long and tedious process. In this project, Python is used to implement hierarchical clustering on NMR data obtained from an NMR spectrometer at the College of Wooster to explore its application in NMR analysis. MATLAB is used to build a decision tree from the same data, whose accuracy is compared to that of the hierarchical clustering. The decision tree is also examined to gain information about how to better automate the analysis process. These data clustering and classification processes are used to identify major functional groups within the compound from the spectral data, once feature extraction has been performed. Once these functional groups are identified, the compounds are clustered via hierarchical clustering, or classified with a decision tree. These processes provide insight into how to identify unknown organic molecules in a faster and more accurate manner, a much-needed improvement in organic chemistry experimental research. It was found that decision trees are a much more accurate machine learning method to classify the organic compounds, when doing so based on present functional groups.

Nicole will be online to field comments on April 16:
10am-noon EDT (Asia: late evening, PST: 6-8am, Africa/Europe: early evening)

Posted in I.S. Symposium 2021, Independent Study on April 10, 2021.