Research Article - Onkologia i Radioterapia ( 2022) Volume 16, Issue 6

Breast cancer detection using bimodal image fusion: Thermography and mammography images

Prabira Kumar Sethy1*, S. Shanthi2, Komma Anitha3, A. Geetha Devi3 and Preesat Biswas4
1Department of Electronics, Sambalpur University,Jyoti Vihar, Burla, India
2Department of Computer Science and Engineering, Malla Reddy College of Engineering and Technology, Hyderabad, Telangana, India
3Department of Electronics and Communication Engineering, PVP Siddhartha Institute of Technology, Vijayawada, Andra Pradesh, India
4Department of Electronics and Telecommunication Engineering, Government Engineering College, Jagdalpur, Chhattisgarh, India
*Corresponding Author:
Prabira Kumar Sethy, Department of Electronics, Sambalpur University,Jyoti Vihar, Burla, Odisha-768019, India, Email:

Received: 19-May-2022, Manuscript No. OAR-22-64211; Accepted: 04-Jun-2022, Pre QC No. OAR-22-64211(PQ); Editor assigned: 23-May-2022, Pre QC No. OAR-22-64211(PQ); Reviewed: 25-May-2022, QC No. OAR-22-64211(Q); Revised: 30-May-2022, Manuscript No. OAR-22-64211(R); Published: 20-Jun-2022


Breast cancer is known as one of the major causes of mortality among women. Breast cancer can be treated with better patient outcomes and significantly lower costs if it is detected early. There are many modalities of images are available for breast cancer diagnosis. Image fusion is a technique that combines the information collected from multiple source images. In this paper, a bimodal image fusion technique is proposed, where the mammography images and thermography images of breast are considered. The deep features of both images are collected by the three pre-trained network like Alexnet, vgg16 and vgg19 individually. The extracted features are merge using concatenation technique and then fed to support vector machine classifier for classification to discriminate between sick and healthy. The vgg16 with SVM using thermal images and mammography images outperform the other two and resulted accuracy of 0.9808, sensitivity of 1, specificity of 0.9615, precision of 0.963, FPR of 0.0385, F1 Score of 0.981, MCC of 0.9623 and Kappa of 0.9615.


Discrete Wavelet Transformation, Image Fusion, Mammography, Super Pixel Segmentation, Thermography


One of the most common types of cancer globally is breast cancer. This type of cancer has been found in both men and women, but it is much more common in women than in men. Early detection of breast cancer would stop the disease from spreading, which could kill the person. Many women don't get regular check-ups for breast cancer because it costs money, and traditional tests, like mammograms, can be painful. Mammograms are thought to be relatively quick and easy to get. But it has some problems, like a small dynamic range, low contrast, and a grainy picture [1]. Still, mammography is a common way to check for breast cancer in the current system [2]. There are other ways to check for breast cancer, like thermography, which is better than mammography in recent studies. Thermography is a process employed in the field of biomedicine that is completely nonintrusive and does not require any physical contact [3]. When it comes to mammography and breast ultrasound, breast density is a vital issue. However, heat patterns of the affected area can be recognized utilizing breast thermography. This allows breast cancers to be diagnosed regardless of the density of the breasts [4]. In recent years, there has been a lot of progress made in scanning procedures because of the innovative advancements in medical imaging. In medical image fusion, many image modalities are combined into one, which opens up the possibility of improving clinical accuracy [5]. If a patient-specific automated breast model is not used to blend individual breast sections and preoperative tumor sites, and indirect fusion of imaging and spatial visualization of a healthcare professional or patient may be unclear and erroneous. This may occur whether the patient or the healthcare professional is imaging. [6]. Image fusion is a technique that utilizes different kinds of image processing techniques to combine different kinds of images. In this paper the bimodal image fusion approach is adapted by considering thermal and mammography images. The major contributions are as follows.

The following is a list of the primary contributions that this paper makes:

• The two modalities of images are considered for breast cancer detection.

• The deep features of pre-trained model are extracted from both modality of images and then concatenated to enhanced the feature vector.

• The fused feature is fed to SVM classifier for detection of breast cancer.

The remaining portion of the paper is organized as below. The developments in breast image fusion techniques are discussed in section 2. In Section 3, a detailed presentation of the proposed method is elaborated. In Section 4, both the results of the experiments and the accompanying commentary are presented. In the fifth section, the article is concluded.

Related work

Cancer is a group of diseases in which cells in the body grow, change, and multiply out of control. Usually, cancer is named after the body part in which it originated. Thus, breast cancer refers to the erratic growth and proliferation of cells that originate in the breast tissue. A group of rapidly dividing cells may form a lump or mass of extra tissue. These masses are called tumors. Tumors can be either cancerous (malignant) or non-cancerous (benign). Malignant tumors penetrate and destroy healthy body tissues. Khuwaja et al. proposed a bimodal Artificial Neural Network (ANN) based breast cancer classification system [7]. The micro calcifications are extracted with adaptive neural networks that are trained with cancer/malignant and normal/benign breast digital mammograms of both Cranio Caudal (CC) and Medio-Latral Oblique (MLO) views. The performance of the networks is evaluated using Receiver Operating Characteristic (ROC) curve analysis. Sensitivity-specificity of 98.0-100.0 for the CC view and 96.0–100.0 for the MLO view networks are recorded for 200 unseen Digital Databases For Screening mammography (DDSM) cases. Gong et al. proposed a bimodal ultrasound network (BUS-Net) capable of simultaneously dealing with the B-mode US and CEUS video [8]. In the CEUS branch, seven CEUS pathological characteristics as multiple labels instead of the traditional two labels (benign and malignant) to extract the pathological semantic representative features. The model can be more general and robust by transforming the binary learning task into a multi-class learning task. In the B-mode US branch, we use a group of shape descrip- -tors to identify hard samples with abnormal morphology. A shape constraint loss term is proposed to impose the shape constraints in the training phase and enhance it’s distinguish ability for hard samples. Finally, the two modal ultrasound data features are fused to realize the classification of benign and malignant tumors. Our experiments show that the classification accuracy is significantly improved using our bimodal strategy. Compared with existing breast ultrasound classification methods, our method increased by an average of 3 percentage points in each evaluation index, and the TNR and AUC index both exceeded 92%. Sasikala et al. [9] focuses on the fusion of Local Binary Pattern (LBP) texture features from ultrasound elastogram and echogram images followed by feature selection using Binary Firefly Algorithm (BFA) with Optimum Path Forest (OPF) classifier accuracy as a fitness function for feature selection. This method produces 97.3% accuracy, 96.2% sensitivity, 98.2% specificity, 97.3% precision, 96.2% F1 score, 94.71% Balanced Classification Rate, and Mathews Correlation Coefficient of 0.884 outperforming existing works. Silva et al. [10] reported a Lightbased technology FOR cancer treatment. Here they combined photodynamic and gamma therapies to verify their potential to treat TNBC. Zahar et al. [11] proposes a novel bimodal deep residual learning model. It consists of the following major steps. First, the informative representation for each input image is separately constructed. Second, in order to construct the high-level joint representation of every two input images and effectively explore complementary information among them, the representation layers of them are fused. Third, all of these joint representations are fused to obtain the final common representation of the input images for the mass. Finally, the recognition result is obtained based on information extracted from all input images. The augmentation strategy was applied to enlarge the collected dataset for this study. Best recognition results on the sensitivity, specificity, F1- score, area under ROC curve, and accuracy metrics of 0.898, 0.938, 0.916, 0.964, and 0.917, respectively, are achieved by our model. Zahar et al. [12] novel bimodal GoogLeNet-based CAD system that addresses the challenges associated with combining information from mammographic and sonographic images for solid breast mass classification. Each modality is initially trained using two distinct monomodal models in the proposed framework. Then, using the high-level feature maps extracted from both modalities, a bimodal model is trained. In order to fully exploit the BI-RADS descriptors, different image content representations of each mass are obtained and used as input images. In addition, using an ImageNet pre-trained GoogLeNet model, two publicly available databases, and our collected dataset, a two-step transfer learning strategy has been proposed. Our bimodal model achieves the best recognition results in terms of sensitivity, specificity, F1-score, Matthews Correlation Coefficient, area under the receiver operating characteristic curve, and accuracy metrics of 90.91%, 89.87%, 90.32%, 80.78%, 95.82%, and 90.38%, respectively.

According to the available research, the vast majority of the contemporary methods employ screening procedures that make use of damaging waves, which are detrimental to the human body [25-26]. Even methods that rely on multiple modalities to make a breast cancer diagnosis at an early stage are lacking in the literature. Thermography illustrates the differences in temperature between different degrees of infrared heat emission. Several factors are extracted from the photos to improve the understanding ability of the thermograms and to make the diagnosis process simpler for the technician.

Material and Methodology

This section describes the dataset and the adapted methodology

About dataset

The work was done with a FLIR SC-620 IR sensor from the Database for Mastology Research that had 640 x 480 pixels (DMR). The thermograms and mammogram images from frontal images are both parts of the DMR [27,28]. It has 96 dpi resolution in both the horizontal and vertical directions. It has pictures of 287 people whose ages range from 29 to 85.

Proposed methodology

For medical images with different modalities, the proposed a novel bimodal image fusion technique was made. The proposed technique uses two kinds of images such as thermography and mammography images. Here the thermography image and a mammography image of the same patient are used. The main components of this methods are pretrained deep CNN models and SVM. Here three pre-trained CNN models are used such as Alexnet, VGG16 and VGG19. The proposed method is illustrated in Figure 1.


Figure 1: Bi-Modal Image fusion approach for BC detection using Mammography and thermal images.

AlexNet: “It is made up of 5 convolutional layers, 3 max-pooling layers, 2 normalization layers, 2 fully connected layers, and 1 softmax layer. Each convolutional layer is composed of convolutional filters and a ReLU nonlinear activation function. Max pooling is accomplished using the pooling layers. Due to the existence of completely linked layers, the input size 224×224×3 is fixed. If the input picture is grayscale, it is converted to RGB by duplicating the single channel to create a threechannel RGB image. AlexNet’s total parameter count is 60 million, with a batch size of 128 [21]”.

VGG: “Oxford Visual Geometry Group researchers introduced VGG16 and VGG19 architectures in 2014. ImageNet 2014's top five accuracy rate was 91.90% for VGG16. VGG16 has five convolution blocks, three thick layers, and 138,355,752 parameters. Convolutional layers plus a max pool layer reduce block output size and noise. The first two blocks have two convolutional layers, and the last three have three. This network's kernel stride is 1. After the five blocks, a flatten layer was added to transform the 3D vector of the blocks to a 1D vector for the completely connected layers. The first two fully connected layers have 4096 neurons, while the final has 1000. After the completely linked layers, a softmax layer ensures that the output probability summation is one. VGG19 features 19 convolution layers instead of 16. Layers increase characteristics from 138,357,544 to 143,667,240. The authors claimed that these layers strengthen the architecture and allow it to learn more complex architectures. Sequential blocks reduce spatial information by inserting convolutional layers after each other [22]”.

SVM is one of a relatively recent and promising technique for learning separate functions in pattern recognition (classification) tasks or for promising function estimates in regression issues [23]. Instead of offering a regression model and an algorithm, SVMs offer a classification learning model and an algorithm [24]. The goal of employing an SVM is to identify a classification criterion (i.e., a decision function) that, at the testing stage, can accurately classify unknown data with good generalization [25]. A training set is said to be linearly separable if a linear discriminant function exists with a sign corresponding to the class of each training example. If a training set can be linearly separated, then there are typically infinitely separating hyperplanes. Choose a separation hyperplane that maximizes the margin or one that leaves the greatest distance between it and the nearest example [26]. Consider a set of data points made up of n vectors xi, each linked to a value yi that indicates whether the element belongs to the fraud class (+1) or not (-1).

Here, the deep feature of Alexnet, vgg16 and vgg19 are extracted from the fully connected layer, i.e., fc8 individually from thermal images and mammography images. It should be taken care that, the deep features of thermal image and mammography images are of same patient with proper labeling (right breast or left breast). The 1000 number of deep features of thermal images and mammography images are merged with use of concatenation technique. The enhanced feature set, i.e., 2000 (1000 deep feature of thermal image + 1000 deep feature of mammography image) are fed to svm for classification. This process is repeated for Alexnet, vgg16 and vgg19. The SVM classify between the benign and malignant.

Result and Discussions

On the DMR dataset of thermal images of the breast, the proposed scheme's efficacy is assessed based on deep feature and svm. The proposed technique is built on an Intel(R) Core (TM) i7-12th generation processor using MATLAB2023a. A Graphical Processing Unit (GPU) NVIDIA RTX 3050 Ti with 4 GB and 16 GB RAM was also used to run the experiments. The research framework was put into place and tested using 800 images with 200 samples in each category, i.e., thermal images (benign 200, malignant 200) and mammography (benign 200, malignant 200). The cross-hold validation technique was used to get the results. Table 1 shows the results of svm with Alexnet, vgg16 and vgg19 in terms of accuracy, sensitivity, specificity and precision. Table 2 shows the results of svm with alexnet, vgg16 and vgg19 in terms of FPR, F1 Score, MCC and Kappa. From Table 1, it is observed that, the svm using deep feature of vgg16 give good results compare to other two, i.e., vgg19+svm and Alexnet+svm. The vgg16 with svm achieved accuracy of 0.9808, sensitivity of 1, specificity of 0.9615, precision of 0.963, FPR of 0.0385, F1 Score of 0.981, MCC of 0.9623 and Kappa of 0.9615.

Tab 1. Performance evaluation of SVM with deep feature of Alexnet, vgg16 and vgg19 in terms of Accuracy, Sensitivity, specificity and precision.





















Tab 2. Performance evaluation of SVM with deep feature of Alexnet, vgg16 and vgg19 in terms of FPR, F1 Score, MCC and Kappa.



F1 Score



















In the medical field, fusing two images from different modalities has always been a challenge. In this paper, two modalities i.e., thermal and mammography images are considered. The deep features of each set of images are extracted and then fed to svm for classification. The svm distinguish between benign and malignant images. The main aim of this research is to help medical instrument designer to design the imaging devices which simultaneously capture the thermal and mammography images. Some complex cases in one modality of imaging incapable to detect the cancer. In that case, the bimodal fusion approach increases the chances of correct detection.


Awards Nomination

Editors List

  • Prof. Elhadi Miskeen

    Obstetrics and Gynaecology Faculty of Medicine, University of Bisha, Saudi Arabia

  • Ahmed Hussien Alshewered

    University of Basrah College of Medicine, Iraq

  • Sudhakar Tummala

    Department of Electronics and Communication Engineering SRM University – AP, Andhra Pradesh




  • Alphonse Laya

    Supervisor of Biochemistry Lab and PhD. students of Faculty of Science, Department of Chemistry and Department of Chemis


  • Fava Maria Giovanna


Google Scholar citation report
Citations : 2495

Onkologia i Radioterapia received 2495 citations as per Google Scholar report

Onkologia i Radioterapia peer review process verified at publons
Indexed In
  • Scimago
  • MIAR
  • Euro Pub
  • Google Scholar
  • Medical Project Poland
  • Cancer Index
  • Gdansk University of Technology, Ministry Points 20