ABSTRACT Automated biomedical image classification has seen enormous improvements in performance over recent years, particularly in radiology. However, the machine learning (ML) methods that have achieved this remarkable performance often require enormous amounts of labeled data for training. An increasingly accepted means of acquiring this data is through the use of natural language processing (NLP) on the free-text reports associated with an image For example, take the following brain MRI report snippet: There is evidence of left parietal encephalomalacia consistent with known history of prior stroke. Small focal area of hemosiderin deposition along the lateral margins of the left lateral ventricle. Here, the associated MRI could be labeled for both Encephalomalacia and Hemosiderin. NLP methods to automatically label images in this way have been used to create several large image classification datasets However, as this example demonstrates, radiology reports often contain far more granular information than prior NLP methods attempted to extract. Both findings in the above example mention their anatomical location, which linguistically is referred to as a spatial grounding, as the location anchors the finding in a spatial reference. Further, the encephalomalacia finding is connected to the related diagnosis of stroke, while the hemosiderin finding provides a morphological description (small focal area). This granular information is important for image classification, as advanced deep learning methods are capable of utilizing highly granular structured data. This is logical, as for instance a lung tumor has a slightly different presentation than a liver tumor. If an ML algorithm can leverage both the coarse information (the general presentation of a tumor) while also recognizing the subtle granular differences, it can find an optimal balance between specificity and generalizability. From an imaging perspective, this can also be seen as a middle ground between image-level labels (which are cheap but require significant data for training?a typical dataset has thousands of images or more) and segmentation (which is expensive to obtain, but provides better training data?a typical dataset has 40 to 200 images), as the fine-grained spatial labels correspond to natural anatomical segments. Our fundamental hypothesis in this project is that if granular information can be extracted from radiology reports with NLP, this will improve downstream radiological image classification when training on a sufficiently large dataset. For radiology, the primary form of granularity is spatial (location, shape, orientation, etc.), so this will be the focus of our efforts. We further hypothesize that these NLP techniques will be generalizable to most types of radiology reports. For the purpose of this R21-scale project, however, we will focus on three distinct types of reports with different challenges: chest X-rays (one of the most-studied and largest-scale image classification types), extremity X-rays (which offer different findings than chest X-rays), and brain MRIs (which present a different image modality and the additional complexity of three dimensions).