The answer varies on how sparse the physical rows are in your sheet, your desire to have simple code, and how much you care about execution speed.
A good compromise of the three would only loop through physical rows, and perform decently well as long as startRow is much closer to getFirstRowNum() than getLastRowNum().
public static boolean isColumnEmpty(Sheet sheet, int columnIndex, int startRow) {
for (Row row : sheet) {
if (row.getRowNum() < startRow) continue;
Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
if (cell != null) {
return false;
}
}
return true;
}
For a workbook with dense rows, your code is better.
For the code that does the least work, you could combine these two approaches (I prefer for-loops over while-loops as it's quicker to verify that your code won't get stuck in an infinite loop)
public static boolean isColumnEmpty(Sheet sheet, int columnIndex, int startRow) {
int firstRow = sheet.getFirstRowNum();
int lastRow = sheet.getLastRowNum();
// No need to check rows above the first row
startRow = Math.max(startRow, firstRow);
int numRows = sheet.getPhysicalNumberOfRows();
// Compute an estimate of the number of rows that each method
// will access.
// Assume the cost to access one row is the same
// between an explicit getRow() or through the rowIterator.
// Assume physical rows are uniformly spaced, which is unlikely true
// but the best estimate possible without iterating over the rows.
double rowDensity = (lastRow - firstRow + 1) / numRows;
double estimatedForEachLoopCost = numRows;
double estimatedForLoopCost = (lastRow - startRow) + 1) * rowDensity;
if (estimatedForEachLoopCost < estimatedForLoopCost) {
// for-each iteration
for (Row row : sheet) {
if (row.getRowNum() < startRow) continue;
Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
if (cell != null) {
return false;
}
}
return true;
} else {
for (int r=startRow; r<=lastRow; r++) {
Row row = sheet.getRow(r);
if (row == null) continue;
Cell cell = row.getCell(columnIndex, Row.RETURN_BLANK_AS_NULL);
if (cell != null) {
return false;
}
}
return true;
}
}
If you really care about performance, you can fork POI and write a method to expose the TreeMap<Integer, XSSFRow> that XSSFSheet uses to access the rows.
Then you could access the minimal number of rows with _rows.tailMap(startRow, inclusive=true).
If you put in a patch and test case on the POI bugzilla for a method that returns java.util.Collections.unmodifiableSortedMap(_rows.subMap(startRow, true, endRow, true)) from HSSF, XSSF, and SXSSF (fails if start row or end row are outside access window, or uses a column tracker similar to autosize column tracker), then add an isColumnEmpty function to the appropriate class, then you could avoid maintaining a fork if your patch gets accepted.