0

*Added an Addendum at the end

Hopefully my Title isn't too vague, but I will try to elaborate here. I posted a similar question: Given 10 random letters where the number of repeated letters is known (i.e. 3,2,1,1,1,1,1), what's the formula for finding the number of combinations? And it looks like I understand the formula for getting permutations once I have the multiset, but I don't understand how to get the multisets to begin with. So this question is just about generating the multisets themselves.

I am not sure if I am using the correct terminology (distribution, possible, unique, combinations, etc.) as I am not a mathematician or student taking a class on this. I am a Software Performance Engineer trying to solve a problem, but I need to understand the problem first. So, please refrain from using terminology, symbols, or expressions that would only be understood by someone who has a deep understanding of probability and multisets to begin with.

My knowledge on this subject is only what I have been able to learn in the past 2-3 days. If you are going to use symbols, shorthand, or terminology specific to this type of math, then please explain what it means. As I found in my other question, (26 1) apparently means something way different than (5 3) and somehow (5 3) = 10!/3!2!, but (26 1) = 26/1... I don't understand how I am supposed to know that or understand it. Also, the Union, Element, and Sum symbols that I have seen in formulas related to this topic (i.e. {\displaystyle |A|=\sum {x\in \operatorname {Supp} (A)}m{A}(x)=\sum {x\in U}m{A}(x)}) don't make sense to me as they have completely different meanings and uses in my field of work. Please try to use math symbols and explanations that can be understood by anyone and don't skip steps when possible.

With that out of the way, given the 26 letters of the English alphabet, how do I find the number of possible multisets for 4 random letters, and then for 5 random letters? To understand it completely, I am looking for all possible combinations of letters. I think that If I can get formulas and explanations that can be applied to those situations, then I should be able to take those and apply them to the 10 letter problem that I am really trying to solve.

Starting with 4, if each letter is unique: 1111 (apxz). Then if there is a duplicate: 211 (aact). then if there are 2 duplicates: 22 (nnii). Then if there is a triplet: 31 (oooy). And finally a quadruplet: 4 (rrrr). Then I need to do the same for 5 letters. So, 11111 (abcde); 2111 (uuakl); 221 (ppjjx); 311 (mmmsc); 32 (hhhww); 41 (qqqqz).

Please don't get hung up on the letter examples that I have given as those are just one potential combination based on the distribution, but I am looking for all possible combinations for that distribution. For the 41 example, I could just have easily used (aaaab), (zzzzq), (jjjjs), etc. because every letter combination is equally possible. I am just trying to figure out how to figure out how many possible ways there are given a known number of repeated letters.

Remember, I am not looking for just the answer, but rather how you go about finding the answer. I need to know which formula to use and why. If I am only given answers that apply to specific scenarios, then there is no way that I can use them to solve for future problems. I am trying to learn to fish, not just be handed a fish.

Thank you in advance.

Addendum:

I'm including a few examples to hopefully illustrate what I am trying to ask. Let's say I have a four letter pattern of "evet". In the final answer to the real world problem I am looking to solve, the order will matter, but for this question the order doesn't matter. So, "veet", "eetv", "vtee" are all the same as "evet" for the purposes of this question.

So, I have that one set of 4 letters, but I need to know how many other variations fit the same pattern. Instead of "evet", I could have gotten "avat", "bbxy", or "ossp". And "avat" = "vaat", "tvaa", etc. "bbxy" = "bbyx", "xybb", "bxyb", etc. (Probably didn't need to reiterate that, but since it seems like people are going out of there way to misunderstand my question, it might have been) I could have gotten any combination of letters so long as one of them is a couple and the other two are monuples. So, how do I write a formula where I can take the number of possible letters (26), then apply the 211 distribution of letters to it so that I get the total number of possible 4 letter words that with one letter doubled - the total number of variants that are in a different order, but still contain the same letter combination?

After I have that, I will need to be able to do the same if I started with "eett" instead. Will the same formula for 211 work for 22? If not, why not? Maybe the problem is that people are answering with simplified formulas for easier versions, but I need one that I can apply universally. Where n=pool of possible letters (26), k=number of selected letters (4 in these examples, but it could just as easily be 5, 10, 20, 100, 1000, etc.), and z=the known number of repeated letters ({2,2} in this example, but it could just as easily be {4},{3,1},{1,1,1,1}, or {2,1,1} when k=4, or it could be {51,25,14,5,3,2},{95,3,1,1},etc. when k=100, or {22,17,8,2,1}, {37,5,5,1,1,1}, etc. when k=50.

I keep seeing answers like (26 1)(25 2) or (26 1)×(25 2)×(23 2), but don't understand where these are coming from. I see there is a formula for binomial coefficients: !!(−)! Which I can use to get the (26 1) if n=26, and k=1 so 26!/(1!(26-1)! = (2625!)/125! = 26/1. Then I get where the 25 comes from as a letter is already accounted for from the previous step and thus is not in the pool of possible letters. But why 2? Where does that come from? If we are using 211, why is (25 2) used? Why not (25 1)(24 1)? That gets a different answer because nothing is divided by 2, but I don't see why a 1 was used for the 2 in 211, but a 2 is used for the 11 in 211. And I haven't seen the break down answer for k=4 with z=22, but I suspect that neither (26 4) nor (26 2) will work based on previous responses. It's probably closer to (26 1)*(25 2) or something like that, but I have no way of knowing currently.

  • To be clear... what is the number you are looking for in the easy cases of $1$ or $2$? Are you looking for the answer of $26$ and $351$? If so, read about stars and bars. Or are you looking for the answers of $1$ and $2$ for the cases of $1$ and $2$ respectively (having counted a and the cases of aa vs ab) – JMoravitz Jul 11 '22 at 02:54
  • This is related to partitions of a number, which itself is a complicated subject without many easy formulas. – aschepler Jul 11 '22 at 02:54
  • Aside: ${26 \choose 1} = \frac{26!}{1! \cdot 25!}$ simplifies to $26$. Many of the factors always cancel out, so there are shortcuts for writing those combination numbers with some canceled numbers omitted in the first place. – aschepler Jul 11 '22 at 03:00
  • I have no idea how this differs from your prior question, which got a comprehensive answer. Are you just asking about the ordinary notation for basic computations in combinatorics? If so, just look it up. – lulu Jul 11 '22 at 03:19
  • It seems I failed once again to clarify my question on this exchange. I am not looking for the a number, the number of partitions, or anything related to stars and bars (or at least I don't think I am, I don't understand half of the information in that article given the amount of shortcuts, uncommon terms, and symbols used in their formulas).

    So, 4 random letters. We know only that they consist of a couple and two monuples (ie. 211) How do I find the number of possible combinations? aabc=1,aacd=2,...zzwx=?,zzxy=?; then for 22, then 31, 4, and 1111. Then do the same for 5 letters.

    –  Jul 11 '22 at 03:27
  • Very helpful lulu... why don't I just look it up? So from here: https://en.wikipedia.org/wiki/Combinatorics#:~:text=Combinatorics%20is%20an%20area%20of,certain%20properties%20of%20finite%20structures.... hmm so which formula do I use? How about?: https://mathigon.org/world/Combinatorics... hmm still not seeing anything like (26/1)((2524)/2)((2322)/2) in there. Can I just us: n!/ r!(n – r)!, nope, that doesn't work. I don't know why or how the other users got their numbers, but I guess I didn't realize I could just ask Google. Thanks for reminding me, because me dumb dumb stupid head. –  Jul 11 '22 at 03:45
  • You wrote "aabc=1,aacd=2,...zzwx=?,zzxy=?" Surely you meant to say "aabc,aabd,aabe,aabf,...wyzz,xyzz" if you were to list them in dictionary order. So, you are asking how to count the number of multisets possible that represent a particular partition? In this case asking how many multisets of size four there are which consist of two repeated letters? Each individual case can be found if you wished to examine a particular case. Pick the letters used for each of the parts of size $n_1$ simultaneously, then pick the letters used for parts of size $n_2$ simultaneously etc... – JMoravitz Jul 11 '22 at 03:49
  • In the specific case of sets of size four consisting of one number repeated and two others which are not repeated, that would be $26\cdot \binom{25}{2}$. This does get tedious quickly if you wanted to list them all out, it is easier to just look at a specific case and find the number there. If you were to add all of these results together however it does simplify to the stars and bars formula of $\binom{26+n-1}{26-1}$ – JMoravitz Jul 11 '22 at 03:50
  • Yeah, JMoravitz, your version makes more sense except at the end. I don't know why you would order them xyzz instead of zzxy when the first letter is supposed to be the double. But, I guess the order doesn't matter yet so xyzz = zzxy. Still, it does make more sense to go through all the aab's before doing the aac's.

    And yes that is what I am trying to ask sort of. For now I am looking for a particular partition, but I want to be able to apply something for any possible partition for any length of letters. For now I am keeping it at 4 and 5, but the final formula should work for 10 or 100.

    –  Jul 11 '22 at 03:59
  • "Pick the letters used for each of the parts of size 1 simultaneously, then pick the letters used for parts of size 2 simultaneously etc"... Ok, what does that even mean? I'm gonna need an example.

    "that would be 26⋅(252)" How did you arrive at that?

    "it does simplify to the stars and bars formula of (26+−126−1)"... So, is that for the 211? or is that for 1111,211,22,31,4 all combined. I am not looking for the total combined, just how to find the number of possible multisets if given 211, 22, or 4.

    –  Jul 11 '22 at 04:07

1 Answers1

0

Assume that the alphabet has $(26)$ characters, and that any Multiset will be drawing its letters from the alphabet.

First, I need to define the term multiplicity, as it relates to a Multiset. Consider the Multiset $\{A,A,A,A,B,B,C,C,D,E\}$. This Multiset has:

  • $1$ letter with multiplicity $4$.
  • $2$ letters, each having multiplicity $2$.
  • $2$ letters, each having multiplicity $1$.

In order to enumerate how many Multisets that there are, for a specific Multiset pattern, you have to do two things:

  • Identify how many distinct multiplicities occur within the Multiset.

  • For each distinct multiplicity, identify how many letters have this multiplicity.

If you identify a Multiset pattern by its multiplicities, then the Multiset $G$ described above, would follow the pattern $\{4,2,2,1,1\}$.

When enumerating the number of Multisets that follow this pattern, there are only two relevant characteristics:

  • How many distinct multiplicities that there are.

  • Within each distinct multiplicity, how many letters have this multiplicity.

What this indicates is that (for example), the number of distinct Multisets that follow the pattern $\{4,2,2,1,1\}$ is the exact same as the number of distinct Multisets that (for example) follow the pattern $\{17, 13, 13, 11, 11\}$.

I will illustrate these ideas by enumerating the number of distinct Multisets that follow the pattern $\{4,2,2,1,1\}$ and distinct Multisets that follow the pattern $\{17,13,13, 11,11\}.$

For $\{4,2,2,1,1\}$, you have to choose $1$ letter that will serve as the letter in the Multiset that has multiplicity $(4)$.

Then, you have to choose $2$ letters from the remaining letters of the alphabet. These $2$ letters will each serve as the letters in the Multiset that have multiplicity $(2)$.

Then, you have to choose $2$ letters from the remaining letters of the alphabet. These last $2$ letters will serve as the letters in the Multiset that have multiplicty $(1)$.

So, you have to choose letters $(3)$ times, because you have $(3)$ distinct multiplicities in the $\{4,2,2,1,1\}$ pattern. Further, for each of Selection-1, Selection-2, Selection-3, you must choose $1,2,$ and $2$ letters respectively.

So, the enumeration is

$$\binom{26}{1} \times \binom{[26 - 1]}{2} \times \binom{[26 - 1 - 2]}{2}. \tag1 $$

So, the $3$ binomial factors represent that there are $(3)$ distinct multiplicities. In each binomial factor $~\displaystyle \binom{n}{k}~$, the $k$ component represents how many different letters share the same multiplicity.

Now, consider the Multiset whose pattern is (for example) $\{17,13,13, 11,11\}.$ The enumeration in (1) above is the exact same for this Multiset pattern as it was for the $\{4,2,2,1,1\}$ pattern.

This is because each pattern involved $3$ distinct multiplicities, and each multiplicity had $1,2,$ and $2$ letters respectively that shared this multiplicity.

A different perspective, when enumerating the $\{4,2,2,1,1\}$ pattern is that first you choose $1$ specific letter that will have multiplicity $(4)$. Suppose that you choose the letter $K$. Then, when you go to choose the next $3$ letters of the alphabet, for this Multiset, you have no choice. The next $3$ letters chosen must be $K,K,K$. This is because you chose the letter $K$ to have multiplicity $(4)$.

Then, suppose that the next two letters, each of which will have multiplicity $2$ are chosen to be $N$ and $C$. Then, the next two letters that you choose must also be $N$ and $C$, to (again) conform to the multiplicity assigned to the $N$ and $C$ letters.


Take a different example: to enumerate the Multiset that follows the pattern $\{3,2,1,1,1,1,1\}$ you would:

  • Count the number of distinct multiplicities.
    In this case there are $(3)$.

  • Count how many letters share each of the multiplicities.
    In this case, the number of letters are $1,1,5,$ respectively.

So the enumeration here would be

$$\binom{26}{1} \times \binom{25}{1} \times \binom{24}{5}.\tag2 $$

Again, note that in the $~\displaystyle \binom{n}{k}~$ factors, in (2) above, the $k$ components follow the pattern $1,1,5$ specifically because the number of multiplicities of each of the three types is $1,1,5$.


More generally, suppose that you have a Multiset with $r$ different multiplicities. Denote these distinct multiplicites as $m_1, m_2, \cdots, m_r$. Further suppose that for each element $i$ in $\{1,2,\cdots, r\}$ the number of letters that share multiplicity $m_i$ is $p_i.$

So, it is being assumed that you have:

  • $p_1$ letters of multiplicity $m_1$.
  • $p_2$ letters of multiplicity $m_2$.
  • $\cdots$
  • $p_r$ letters of multiplicity $m_r$.

Because there are only $(26)$ letters in the alphabet, it is also being assumed that :
$p_1 + p_2 + \cdots + p_r \leq 26.$

Then, the enumeration of the number of distinct Multisets that follow this pattern will be:

$$\binom{26}{p_1} \times \binom{[26 - p_1] }{p_2} \times \binom{[26 - p_1 - p_2]}{p_3} \times \cdots $$

$$\times \binom{[26 - (p_1 + p_2 + \cdots + p_{r-1})]}{p_r}.$$

Note that in this generic case, that involved the distinct multiplicities $m_1, \cdots, m_r$ with $p_1, \cdots, p_r$ different letters assigned to $m_1, \cdots, m_r,$ respectively, it is irrelevant what the actual size of each of the multiplicities $m_1, m_2, \cdots, m_r$ happen to be. As a way of illustrating that, note that the variables $m_1, m_2, \cdots, m_r$ do not appear anywhere in the generic formula above.

user2661923
  • 42,303
  • 3
  • 21
  • 46
  • Thank you! That is exactly what I was looking for. I am sure that other articles, answers, and examples might have been expressing the same thing, but just in a format I didn't understand or with the use of shorthand that made it impossible for me to follow.

    I am still not exactly sure why in your example you chose 2 at a time for the N,C then N,C again rather than N as the first double, then C as the second double which would result in (25 1)*(24 1), but I don't think it is necessary for me to understand that to write a working formula that I can use, scale, and adjust for my needs.

    –  Jul 11 '22 at 05:43
  • @TimothyJoseph It is important to associate specific mathematical factors with specific actions. The $~\displaystyle \binom{25}{2}~$ factor represents selecting (in the specific example) two distinct letters (such as N and C), each different from the first letter chosen, K. Then, once these two letters are chosen, it is as if there now exists two buckets: an N bucket and a C bucket. Further, since the pertinent multiplicity here is $(2)$, the N bucket must add a 2nd N, so that the N bucket now has $(2)$ letters in it. The C bucket must also be similarly formatted. – user2661923 Jul 11 '22 at 13:04
  • @TimothyJoseph There is another hidden issue. Consider the Multiset pattern ${1,2}$. I have asserted (in effect) that the number of different Multisets that follow this pattern is $$\binom{26}{1} \times \binom{25}{2}.$$ This begs the question: re previous comment, if all that the pertinent binomial factors represent is selecting a single character to go into specific Multiplicity buckets, then why isn't the enumeration instead $$\binom{26}{3}?$$ ...see next comment – user2661923 Jul 11 '22 at 13:12
  • Answer: because with the $~\displaystyle \binom{26}{3}~$ factor, having chosen the $3$ letters, you then have to choose which of the $3$ letters will have multiplicity $(1)$, rather than multiplicity $(2)$. So, $$\binom{26}{3} \times 3 = \binom{26}{1} \times \binom{25}{2}.$$ – user2661923 Jul 11 '22 at 13:14
  • Ok, a slight follow up. Does the order of the multiplicities matter? One example you used was 42211, but would 11224 yield the same result? because then you would get (26 2)(24 2)(22 1). I haven't done the math yet to see if they are equivalent. I was just hoping you might know. –  Jul 12 '22 at 05:02
  • @TimothyJoseph The order that the multiplicities are listed is totally irrelevant, as long as identical multiplicities are grouped together. So, ${4,2,2,1,1}$ may be equivalently expressed as (for example) ${1,1,4,2,2}$ or ${1,1,2,2,4}.$ It is a reasonable (and short) exercise to actually do the three corresponding computations pertinent to this comment, and verify that the results are mathematically equivalent. – user2661923 Jul 12 '22 at 13:07