How to Count Specific Words in Excel Text String

How to Count Specific Words in Excel Text String

How can you count specific words in a text string, using an Excel formula?  For example, a cell has this text – air, hair brush ,hot air, hair. The word “air” should be counted once. It shouldn’t be counted if it’s part of other words or phrases.

Cells with Text Strings

Here is the sample data that I used for this problem. To get the sample file, with the sample data and the Excel formulas, go to the Count Specific Text in Cell page on my Contextures site.

There are four cells with the following comma-separated words and phrases. There are extra spaces in some of the items, before or after the commas.

  • air,dig,air,air
  • air, hair brush ,hot air, hair
  • house, dig, air , hair , air, dig
  • house ,air flow,hair, air

At the top of the sheet, you can enter a search word or phrase in cell B2.

In column C, formulas count the number of times that exact word or phrase is found between the commas.

countspecifictextcell17

Rules for Counting Words

For this challenge, we don’t want to count the search word/phrase if it’s part of another word or phrase. We only want to count it when it’s on its own.

For example, there’s only 1 “air” in row 6. We won’t count the “air” in hair brush, hot air, or hair.

NOTE: If you DO want to count each occurrence, use the LEN/SUBSTITUTE formula shown on my site.

countspecifictextcell17a

Google Sheets SPLIT Function

Before we look at Excel formulas, let’s take a quick look at how easy this solution is in Google Sheets.

Unlike Excel, Google Sheets has a SPLIT function.

Combine SPLIT with SUBSTITUTE, to remove the spaces, and COUNTIF, to get the count:

  • =COUNTIF(SPLIT(SUBSTITUTE(B5,” “,””),”,”),$B$2)

countspecifictextcell26

Excel Formulas to Count Words

Unfortunately, Excel doesn’t have a SPLIT function, so we need to use a longer formula to get a count of the words.

  • Excel 2013 or later: Use the FILTERXML formula, shown below
  • Earlier versions: Use LEN and SUBSTITUTE. Go to my website for two solutions using these functions.

Separate Text with FILTERXML

While looking for ways to split text in Excel, I found a really creative way to do that – use the FILTERXML function. The first place I saw this trick was in a comment on Chandoo’s forum.

Splitting text isn’t what FILTERXML was designed for! Usually, it’s used with the WEBSERVICE function, to to extract specific items from text that’s already in XML format. You can see an example of that type of FILTERXML formula on the Microsoft site.

Anyway, some clever person realized that you could build the XML on the fly, from comma-separated text. Here’s what it would look like, all tidied up. The angle brackets mark the beginning and end of each item.

countspecifictextcell24

XML Data for FILTERXML Function

Here’s how we’ll use FILTERXML to split the text, so we can get a count of matching items.

The FILTERXML function has 2 arguments – XML and XPath.

  • For the XML argument, we’ll use SUBSTITUTE to change comma-separated text into an XML structure.
  • In the sample file, there are XML marker strings at the top of the sheet,  in cell E1, F1 and G1.

countspecifictextcell09

This formula creates our XML code for the FILTERXML function:

  • $E$1 & SUBSTITUTE(B6,”,”,$F$1) & $G$1

It’s not pretty, but the FILTERXML function will recognize those angle bracket start and end markers.

countspecifictextcell25

XPath for FILTERXML Function

The tricky part of this formula was the XPath argument. After many hours of digging through the internet, and experimenting with XPath variations, I finally created an XPath that did what I need.

The XPath is created with a formula in cell H1, at the top of the worksheet.

  • =”//i[normalize-space()='” & B2 & “‘]/@n”

This XPath will return the “n” value for the items that match our search word (in cell B2).

There are more details on XPath further down this page, if you’re interested.

countspecifictextcell27

The FILTERXML Formula

Now that we have the XML and the XPath, here is the FILTERXML part of final formula.

The XML is in bold red, and the XPath is in bold italic blue.

  • FILTERXML($E$1 &SUBSTITUTE(B5,”,”,$F$1) &$G$1,$H$1)

The result will be a list of 1s, or a zero, so next, we need to get a total of those numbers.

countspecifictextcell28

Get the Count from FILTERXML

To get the count, we can use the SUM function. Each matching item returns a 1, so 3 items will give us a sum of 3 – which is equal to the count that we need.

Finally, the formula is wrapped with IFERROR. It will return a zero, if no matches are found.

Here is the completed formula:

  • =IFERROR(SUM(FILTERXML($E$1 &SUBSTITUTE(B5,”,”,$F$1) &$G$1,$H$1)),0)

I’ve highlighted the matching items in the comma-separated text, and the counts are all correct.

countspecifictextcell17

Experiments with FILTERXML XPath

  • NOTE: You don’t need to read any further, unless you want to hear about the long and winding journey I took , while building this formula. Maybe you’ve been on similar trips!

Creating the XML data for the FILTERXML function was the easy part. The challenge was building an XPath that did what I needed.

The first FILTERXML example I found, in Chandoo’s forum, used a simple XPath – //i – to create a list of items.

In my first experiment, I used FILTERXML in a worksheet cell, with that simple XPath in cell E4

With this formula in cell E6, it created a list of all the items from cell B6:

  • =FILTERXML($E$1 & SUBSTITUTE(B6,”,”,$F$1) & $G$1,E4)

To count the matching items, I put this formula in cell E5:

  • =COUNTIF(E6#,B2)

That worked nicely, and showed the correct count of 1.

countspecifictextcell19

COUNTIF and FILTERXML

Next, I needed a single formula, so I combined the 2 steps:

  • =COUNTIF(FILTERXML($E$1 & SUBSTITUTE(B6,”,”,$F$1) & $G$1,E3),$B$2)

Excel was not impressed! Right – COUNTIF doesn’t work with formula arrays.

countspecifictextcell18

SUMPRODUCT and FILTERXML

The SUMPRODUCT function is more forgiving, so I tried it next, using this formula in cell E5:

  • =SUMPRODUCT((E6#=B2)*(E6#<>””))

That worked correctly, so I combined it with the FILTERXML formula:

  • =SUMPRODUCT((FILTERXML($E$1 & SUBSTITUTE(B6,”,”,$F$1) & $G$1,E4)=B2)
    * (FILTERXML($E$1 & SUBSTITUTE(B6,”,”,$F$1) & $G$1,E4)<>””))

The combined formula worked correctly, but wow, that’s a long formula!

COUNT and FILTERXML

Another test showed that COUNTA works with formula-created arrays. However, I’d need an XPath that filters out the items that match the search word.

Believe me when I tell you that it’s hard to find XPath examples for that on the internet.

Finally, I found this example – //i[.=’air’] – which worked, but I had to add another SUBSTITUTE, to remove spaces.

  • =FILTERXML($E$1 & SUBSTITUTE(SUBSTITUTE(B7,” “,””),”,”,$F$1) & $G$1,E4)

Also, that formula created a #VALUE! error, if the search word wasn’t found.

And it counted the error! That’s no good.

countspecifictextcell20

And, if I wrapped the formula with IFERROR, to return an empty string, it counted the empty string.

Sigh.

countspecifictextcell21

Return a Number

What if the XPath could return a number, instead of the text item? If it could, I’d return a zero in IFERROR, and use SUM to total the results!

On this new mission, I went deeper and deeper into the XPath rabbit hole, but couldn’t find what I needed.

Finally, I remembered seeing an XPath example that filtered based on an attribute. Maybe I could add an attribute of “1” to each item, and return that attribute for the filtered items.

So, I changed the data/item text strings at the top of the sheet, in cells E1 and F1.

Now, each item has an “n” attribute with a value of “1”.

countspecifictextcell09

Change the XPath

More Googling found an XPath setting that would return those attributes. The XPath ended with the AT symbol, and the attribute name:

Also, I found an XPath function to trim the leading, trailing, and duplicate spaces:

  • normalize-space

And here is the final XPath string that I used for this project, after days of wandering through the internet:

  • //i[normalize-space()=’air’]/@n

And this is the final formula, that counts the matching items:

=IFERROR(SUM(FILTERXML($E$1 &SUBSTITUTE(B5,”,”,$F$1) &$G$1,$H$1)),0)

It was an exhausting journey through the FILTERXML and XPath rabbit holes, but rewarding in the end!

countspecifictextcell17

Get the Sample File

To get the sample file, go to the Count Specific Text in Cell page on my Contextures site.

The zipped file is in xlsx format, and does not contain any macros.

There is also a link to the Google Sheets version of the sample data and formulas. That file is VIEW ONLY, but you can create a copy to edit.

_________________________________

How to Count Specific Words in Excel Text String

Count Specific Words in Text String

Count Specific Words in Text String

________________________________

2 thoughts on “How to Count Specific Words in Excel Text String”

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.