rakesh kumar

Posted on Oct 27, 2023

Explain Apache POI Data-Driven Framework

Apache POI Data-Driven Framework

Classes and Interfaces in Apache POI

List of different Java Interfaces in the Apache POI API

List of different classes in the Apache POI
Real time example of Apache POI Data-Driven Framework in selenium python

Apache POI Data-Driven Framework

Apache Poor Obfuscation Implementation (POI) is a set of open-source Java-based libraries developed by the Apache Foundation which is used to manipulate Microsoft Excel documents such as .xls and .xlsx file formats.
It is an extensive API (Application Programming Interface) used to automate Selenium Data-driven tests by which we can create, modify, read, and write excel data by using the Java program.
POI API supports both binary and spreadsheet workbook. It has many features to work with modern Microsoft products.
The test performance with the .xlsx file will be slower as compared to the .xls file in the Apache POI.
Apache POI is dependent on the xmlbeans library which must be added in the build path before executing any test.

Classes and Interfaces in Apache POI

Apache POI contains a list of different Java Interfaces and classes for reading the XLS and XLSX format files.

List of different Java Interfaces in the Apache POI API

Workbook

The term “Workbook” represents Microsoft excel file. It is a super-interface of the HSSFWorkbook and XSSFWorkbook classes and presents in org.apache.poi.ss.usermodel package.

It is used to create and maintain the spreadsheet. A workbook may contain many sheets. HSSFWorkbook and XSSFWorkbook implement workbook interface.

Sheet

A sheet is a super-interface of the HSSFSheet and XSSFSheet classes and presents in the org.apache.poi.ss.usermodel package. It creates a high-level or low level of spreadsheets with a specific name.

A spreadsheet is a page in a Microsoft Excel file that contains the number of rows and columns and is identified with a specific name.

The columns and rows are identified with alphabets and numbers respectively.

The most common type of spreadsheet is a worksheet that represents a grid of cells. HSSFSheet and XSSFSheet classes implement sheet interface.

Row

A row represents a collection of cells. It is a super-interface of the HSSFRow and XSSFRow classes and presents in the org.apache.poi.ss.usermodel package.

It is used to represent a row in the spreadsheet. HSSFRow and XSSFRow classes implement the row interface.

*Cell *

A cell represents a collection of columns in the spreadsheet. When we enter data into a spreadsheet, the data is always stored in a cell. A cell is identified by the level of rows and columns.

It is a super-interface of all the classes which represents a cell in a row of the spreadsheet. HSSFCell and XSSFCell classes implement the cell interface.

Types of Cells

The cell type represents that a cell may contain strings, numeric values, or formulas. A string cell cannot hold numeric values and a numeric cell cannot hold string values.

The following are the types of cells, their value, and syntax.

List of different classes in the Apache POI

Apache POI is a popular Java library for working with Microsoft Office documents, such as Excel spreadsheets and Word documents. It provides several classes and interfaces for different purposes when dealing with Office documents. Here are some of the key classes and their purposes:

HSSFWorkbook/XSSFWorkbook:

These classes represent Excel workbooks.
HSSFWorkbook is for the older .xls format, while XSSFWorkbook is for the newer .xlsx format.
Example:

HSSFWorkbook workbook = new HSSFWorkbook(); // for .xls
XSSFWorkbook workbook = new XSSFWorkbook(); // for .xlsx

HSSFSheet/XSSFSheet:

These classes represent Excel sheets within a workbook.
HSSFSheet is for .xls, and XSSFSheet is for .xlsx.
Example:

HSSFSheet sheet = workbook.createSheet("Sheet1"); // for .xls
XSSFSheet sheet = workbook.createSheet("Sheet1"); // for .xlsx

HSSFRow/XSSFRow:

These classes represent rows within an Excel sheet.
HSSFRow is for .xls, and XSSFRow is for .xlsx.
Example:

HSSFRow row = sheet.createRow(0); // for .xls
XSSFRow row = sheet.createRow(0); // for .xlsx

HSSFCell/XSSFCell:

These classes represent individual cells within a row.
HSSFCell is for .xls, and XSSFCell is for .xlsx.

HSSFCell cell = row.createCell(0); // for .xls
XSSFCell cell = row.createCell(0); // for .xlsx

HSSFRichTextString/XSSFRichTextString:

These classes represent rich text strings in cells.
Example:

HSSFRichTextString richText = new HSSFRichTextString("Hello, world!"); // for .xls
XSSFRichTextString richText = new XSSFRichTextString("Hello, world!"); // for .xlsx
HSSFCellStyle/XSSFCellStyle:

These classes define cell styles, such as fonts and formatting.
Example:

HSSFCellStyle cellStyle = workbook.createCellStyle(); // for .xls
XSSFCellStyle cellStyle = workbook.createCellStyle(); // for .xlsx

HSSFFont/XSSFFont:

These classes define font styles.
Example:

HSSFFont font = workbook.createFont(); // for .xls
XSSFFont font = workbook.createFont(); // for .xlsx

HSSFDataFormat/XSSFDataFormat:

These classes define data formats for cells.
Example:

HSSFDataFormat dataFormat = workbook.createDataFormat(); // for .xls
XSSFDataFormat dataFormat = workbook.createDataFormat(); // for .xlsx

HSSFDateUtil/XSSFDateUtil:

These classes provide utility methods for working with date cells.
Example:

Date date = HSSFDateUtil.getJavaDate(cell.getNumericCellValue()); // for .xls
Date date = XSSFDateUtil.getJavaDate(cell.getNumericCellValue()); // for .xlsx

HSSFPictureData/XSSFPictureData:

These classes represent picture data embedded in the Excel document.
Example:

HSSFPictureData pictureData = workbook.getAllPictures().get(0); // for .xls
XSSFPictureData pictureData = workbook.getAllPictures().get(0); // for .xlsx

These are some of the main classes in Apache POI that you'll commonly use when working with Excel files. The examples provided are in Java, as Apache POI is primarily a Java library, but you can use similar classes and concepts in other languages if there are compatible libraries available.

Real time example of Apache POI Data-Driven Framework in selenium python

he real-time application of an Apache POI Data-Driven Framework in Selenium with Python can be found in automating repetitive tasks in scenarios where data is stored in Excel files. Here's a practical example:

Scenario: Consider a scenario where you need to perform data-driven testing for a registration form on a website. You want to validate that the registration form works correctly for multiple users by reading the user data from an Excel file using the Apache POI framework.

Steps:

Excel Data Preparation: Prepare an Excel file containing user registration data, such as usernames, email addresses, passwords, and expected outcomes.

Example Excel Data:
Excel Data Preparation: Prepare an Excel file containing user registration data, such as usernames, email addresses, passwords, and expected outcomes.

Example Excel Data

Selenium Test Script:

Here's a sample Python script that demonstrates how to use the Apache POI Data-Driven Framework to automate the registration form testing:

Execution:

When you run this script, it will open the website, fill in the registration form with data from the Excel file, submit the form, and then compare the actual result with the expected result for each user.

Reporting:

You can further enhance the framework to generate detailed test reports, capture screenshots on failure, and handle exceptions for more robust testing.

In real-time applications, data-driven frameworks like this allow you to perform the same actions repeatedly with different sets of data. It's particularly useful in scenarios like user registration, login, data entry, and validation, where you need to test multiple data combinations efficiently and comprehensively without writing separate test cases for each scenario.

Execution:

Reporting:

You can further enhance the framework to generate detailed test reports, capture screenshots on failure, and handle exceptions for more robust testing.

Note

In the line of code for row in sheet.iter_rows(min_row=2, values_only=True):, min_row=2 is an argument that specifies the starting row from which the iteration over the rows of an Excel sheet should begin. The iter_rows method is used to iterate through the rows of the Excel sheet, and min_row allows you to skip the header row or any rows you don't want to include in your iteration.

Here's an example to illustrate the usage of min_row:

Consider you have the following data in an Excel sheet:

| ID  | Name     | Age |
| --- | -------- | --- |
| 1   | Alice    | 30  |
| 2   | Bob      | 25  |
| 3   | Carol    | 35  |

Now, if you use the iter_rows method without specifying min_row, it will iterate through all rows, including the header row. For example:

import openpyxl

workbook = openpyxl.load_workbook("example.xlsx")
sheet = workbook.active

for row in sheet.iter_rows(values_only=True):
    print(row)

This code will output:

(None, 'ID', 'Name', 'Age')
(1, 'Alice', 30)
(2, 'Bob', 25)
(3, 'Carol', 35)

In this case, the header row is included in the iteration.

Now, if you want to skip the header row and start iterating from the second row (i.e., the row with ID=1), you can specify min_row=2 like this:

import openpyxl

workbook = openpyxl.load_workbook("example.xlsx")
sheet = workbook.active

for row in sheet.iter_rows(min_row=2, values_only=True):
    print(row)

This code will output:

(1, 'Alice', 30)
(2, 'Bob', 25)
(3, 'Carol', 35)

By specifying min_row=2, you effectively skip the first row (header) during the iteration. This is commonly used in data-driven testing scenarios, as you often want to exclude the header row when iterating through the test data in an Excel file.