Why did I made this?

Recently, a friend of mine working in finance asked for a favor: a little tool to solve some work hassles and save her time. She’s swamped with tons of Excel sheets daily, filled with endless entries. Often, there are just minor updates, making row-by-row comparisons a true nightmare. So, she begged for a quick Excel comparison tool. (Yeah, there are paid tools out there, but she’s a bit … thrifty. 😂)

Overview of the app

To quickly craft this tool, I turned to Python, knowing that the pandas library excels at managing Excel data. I hadn’t made GUI applications with Python before. But given my comfy rapport with C/C++’s Qt, I opted for PyQt (Qt’s Python binding). Its logic and most functions are same as pure Qt, so learning it was a cinch. The app’s blueprint is straightforward: PyQt for the GUI and pandas for Excel data. And, I’ve used the PyQt5 version here (Update 2021: the same code works for PyQt6 too!).

Start GUI

The app’s GUI is simple: two buttons to upload Excel files, a button to reset, and a table widget to display the Excel data. After second Excel file is uploaded, the table widget will display the second Excel table with differences highlighted. The differences are highlighted in red. The app also supports multiple sheets in one Excel file, and the sheet names are displayed in the combo button, which can be used to switch between sheets. There is also a label to display the number of differences found.

After GUI

The app’s logic is also simple: load Excel files with pandas, compare them, and display the table with differences highlighted. We’ll go through the code step by step in the following sections.

Requirements and dependencies

The app is written in Python 3, it should work in Windows, Linux, and macOS.

Install the following packages with pip:

$ pip3 install pandas pyqt5

Update 2021:

if you want to use PyQt6, install it with:

$ pip3 install pyqt6

Build the GUI

There are two ways to build the Qt GUI in Python: import the UI file made by Qt Designer or coding it directly. I chose to code it directly, as this app GUI is simple.

The app is not complex, I plan to finish all jobs in one QWidget class. So, I inherited the QWidget class and added the widget in the constructor. All functions are defined in the class. The main code framework is as follows:

# main.py
import sys
# "if you use PyQt6, import it instead of PyQt5"
from PyQt5.QtCore import Qt
from PyQt5.QtWidgets import QApplication, QWidget, QFileDialog, QTableWidget, QTableWidgetItem
from PyQt5.QtWidgets import QButtonGroup, QPushButton, QHBoxLayout, QVBoxLayout, QLabel
from PyQt5.QtGui import QColor
import pandas as pd

class FilesCompare(QWidget):
    def __init__(self):
        super().__init__()

        # init some variables
        self.first_data = None # first Excel data
        self.second_data = None # second Excel data
        self.current_sheet = None # current sheet name
        self.current_button = None # current sheet button
        
        self.initUI()
    
    def initUI(self):
        ......

    ......
    other functions
    ......

def main():
    app = QApplication(sys.argv)
    fc = FilesCompare()
    fc.show()
    sys.exit(app.exec())

if __name__ == '__main__':
    main()

Because we code the UI directly, we need to define all UI elements and layout them in the constructor. Here are the code snippets:

def initUI(self):
    self.setWindowTitle("Excel Diff Checker")
    layout = QVBoxLayout() # main layout is vertical

    self.reset_button = QPushButton("Reset")
    self.reset_button.setFixedSize(150, 30)
    # connect to reset function, signal is reset_button.clicked
    self.reset_button.clicked.connect(self.reset) 
    layout.addWidget(self.reset_button)

    self.first_button = QPushButton("Upload Old File")
    self.first_button.setFixedSize(150, 30)
    # connect to load_first_file function, signal is first_button.clicked
    self.first_button.clicked.connect(self.load_first_file)
    self.first_label = QLabel()
    first_layout = QHBoxLayout()
    first_layout.addWidget(self.first_button)
    first_layout.addWidget(self.first_label)
    layout.addLayout(first_layout)

    self.second_button = QPushButton("Upload New File")
    self.second_button.setFixedSize(150, 30)
    # connect to load_second_file function, signal is second_button.clicked
    self.second_button.clicked.connect(self.load_second_file)
    self.second_button.setEnabled(False)
    self.second_label = QLabel()
    second_layout = QHBoxLayout()
    second_layout.addWidget(self.second_button)
    second_layout.addWidget(self.second_label)
    layout.addLayout(second_layout)

    # sheet buttons
    self.button_group = QButtonGroup(self)
    # connect to sheet_changed function, signal is button_group.idClicked
    self.button_group.idClicked.connect(self.sheet_changed)
    self.sheet_buttons_layout = QHBoxLayout()
    self.sheet_buttons_layout.setAlignment(Qt.AlignmentFlag.AlignLeft)
    self.sheet_buttons_layout.setSpacing(10)
    layout.addLayout(self.sheet_buttons_layout)

    self.result_label = QLabel()
    self.result_label.setStyleSheet("color: blue")
    layout.addWidget(self.result_label)

    # here is the table widget to display Excel data
    self.table_widget = QTableWidget()
    self.table_widget.horizontalHeader().setStyleSheet(
        "QHeaderView::section { background-color: #F0F0F0 }"
    )
    self.table_widget.verticalHeader().setStyleSheet(
        "QHeaderView::section { background-color: #F0F0F0 }"
    )
    self.table_widget.setEditTriggers(QTableWidget.EditTrigger.NoEditTriggers)
    layout.addWidget(self.table_widget)

    self.setLayout(layout)

    self.resize(800, 600)
    self.center_on_screen() # center the window on screen

def center_on_screen(self):
    screen = QApplication.primaryScreen()
    screen_center = screen.availableGeometry().center()
    window_rect = self.frameGeometry()
    window_rect.moveCenter(screen_center)
    self.move(window_rect.topLeft())

def reset(self):
    self.first_data = None
    self.second_data = None
    self.first_label.clear()
    self.second_label.clear()
    self.current_sheet = None
    self.second_button.setEnabled(False)
    self.result_label.clear()
    self.table_widget.clear()
    self.table_widget.setRowCount(0)
    self.table_widget.setColumnCount(0)
    self.current_button = None 

    # remove all sheet buttons
    while self.sheet_buttons_layout.count():
        button = self.sheet_buttons_layout.takeAt(0).widget()
        if button:
            self.button_group.removeButton(button)
            button.deleteLater()
    
    self.resize(800, 600)
    self.center_on_screen()

Qt uses signal and slot mechanism to handle events, so we need to connect the buttons’ clicked signal to the corresponding slots. slot is just a function, and we need to define the functions.

The reset_button is connected to the reset function, the first_button is connected to the load_first_file function, and the second_button is connected to the load_second_file function. The sheet_buttons are connected to the sheet_changed function. The sheet_changed function will be called when the sheet button is clicked, and it will change the current_sheet and update the table widget. The reset function will reset all variables and clear the table widget. The load_first_file function will load the first Excel file. The load_second_file function will load the second Excel file and display the second Excel with differences highlighted in the table widget.

Deal with Excel

Now let’s deal with Excel files. QFileDialog is used to open the file dialog to select Excel files. The pandas.read_excel function can load Excel files into pandas.DataFrame. We can use the pandas.DataFrame.ne (“ne” means “not equal”) function to compare two DataFrames, it will return a DataFrame with True and False values. True means the two DataFrames are not equal, False means the two DataFrames are equal.

Here is the code snippet:

def load_first_file(self):
    # open file dialog to select Excel file, xls and xlsx are supported
    file_name, _ = QFileDialog.getOpenFileName(self, "Upload First Excel File", "", "Excel Files (*.xls *.xlsx);")
    if file_name:
        self.first_data = pd.read_excel(file_name, sheet_name=None)
        self.second_button.setEnabled(True)
        self.reset_button.setEnabled(True)
        self.first_label.setText(file_name.split('/')[-1])
        self.first_label.setStyleSheet("color: blue")

def load_second_file(self):
    file_name, _ = QFileDialog.getOpenFileName(self, "Upload Second Excel File", "", "Excel Files (*.xls *.xlsx);")
    if file_name:
        self.second_label.setText(file_name.split('/')[-1])
        self.second_label.setStyleSheet("color: blue")
        self.second_data = pd.read_excel(file_name, sheet_name=None)
        # create sheet buttons according how many sheets in the Excel file
        for i, sheet in enumerate(self.first_data.keys()):
            button = QPushButton(sheet)
            button.setFixedSize(150, 25)
            self.button_group.addButton(button, i)
            self.sheet_buttons_layout.addWidget(button)
        self.resize(1500,960)
        self.center_on_screen()
        # set the first sheet as current sheet
        self.sheet_changed(0)

def sheet_changed(self, id):
    # if sheet button is switched, change the current sheet
    if self.current_button: 
        self.current_button.setStyleSheet("")
    self.current_sheet = list(self.first_data.keys())[id]
    self.current_button = self.button_group.button(id)
    self.current_button.setStyleSheet("background-color: darkCyan")

    self.compare_and_display_data()

def compare_and_display_data(self):
    if self.first_data is not None and self.second_data is not None:
        first_df = self.first_data[self.current_sheet]
        second_df = self.second_data[self.current_sheet]
        # compare two DataFrames and get a DataFrame with True and False values
        change_mask = first_df.ne(second_df)
        # True means the two DataFrames are not equal, False means the two DataFrames are equal
        count_changes = change_mask.sum().sum() # count the number of changes

        self.result_label.setText(f"Total number of changes in sheet {self.current_sheet}: {count_changes}")  

        # display the second DataFrame in the table widget
        self.table_widget.setRowCount(second_df.shape[0])
        self.table_widget.setColumnCount(second_df.shape[1])
        self.table_widget.setHorizontalHeaderLabels(second_df.columns.tolist())

        for i in range(second_df.shape[0]):
            for j in range(second_df.shape[1]):
                item = QTableWidgetItem(str(second_df.iat[i, j]))
                if change_mask.iat[i, j]:
                    # if the value is different, highlight it in red
                    item.setBackground(QColor(255, 0, 0))
                self.table_widget.setItem(i, j, item)

To be more clear, I will explain these two line of code:

change_mask = first_df.ne(second_df)
count_changes = change_mask.sum().sum()

first_df.ne(second_df) use the pandas.DataFrame.ne funcion, short for “not equal”, to compare each element in first_df against the corresponding element in second_df. It returns True for inequalities and False for matches.

change_mask is a new DataFrame with the same structure as the original DataFrame, but filled with boolean values indicating discrepancies.

Here is an example:

first_df:
       col_A  col_B
row_0   1.5    2.2
row_1   3.1    4.6

second_df:
       col_A  col_B
row_0   1.5    2.7
row_1   5.4    4.6

change_mask:
       col_A  col_B
row_0  False  True
row_1  True   False

change_mask.sum().sum() is to count the True values in change_mask, indicating differences. change_mask.sum() counts the True values column-wise. change_mask.sum().sum() then sums up these column counts to give a total difference count across the entire DataFrame. In this example, it’s 2, indicating two discrepancies between first_df and second_df.

Other code is easy to understand, I also added some comments in the code, which should be easy to follow.

Run the app and conclusion

Put all above functions into class FilesCompare, and run the app with python3 main.py. The app should work as expected. But I didn’t add any error handling here. For example, if the Excel file is not valid, the app will crash. You can add some error handling code to make the app more robust.

The code is quite simple, but it’s enough for my friend’s needs. I hope this article can help you too.