Cancer is considered one of the most dilapidating health problems that the world is facing due to its physical, emotional, financial, and spiritual toll. Automating cancer diagnosis can ultimately impact its treatment and recovery. Computational algorithmic methods can greatly improve the efficiency of pathologists through partial or complete automation of the diagnostic process. Computer-aided diagnosis has augmented preventive check-ups for many medical conditions like breast cancer, colonic polyps, and lung cancer. Digitization of tissue slides has thus opened up the process of diagnosis through analysis of digital images. The dearth of highly trained pathologists who can address the growing diagnostic needs heightens the importance of such automation. Recent advances in big data analytics and in particular machine learning can possibly impact greatly the domain of computer-aided cancer diagnosis. Convolutional Neural Networks (CNNs) in particular have already revolutionized the domain of computer vision with performances in various cases compared to that exhibited by humans. One of the main factors that fueled the recent resurgence of CNNs is the availability of large datasets. CNNs adjust, via training, millions of parameters allowing them to learn complex and highly nonlinear dependencies among data (i.e., images). However, collecting such large amounts of annotated data (assigning them to one of many possible categories, e.g., benign vs. cancerous vs. other stages) is either challenging or very expensive or in many cases unavailable. This is definitely the case of the medical domain. Tissue slides from suspected cancerous regions are examined under a microscope and are classified as benign or malignant. CNNs offer a promising pathway to achieve some degree of automation in identifying cancerous cases in image data. This research work will explore the challenges of discovering the underlying discriminative features, hidden in the image and possibly different than those used by human experts, in order to improve the accuracy of diagnosis. We will also focus on algorithms to minimize the amount of data required to train the neural network without sacrificing performance and generalization.