Page 1 of 3

10745 - Dominant Strings

Posted: Sun Oct 17, 2004 12:53 pm
by BiK
Hello,

We tried solving the problem by reading the strings one after the other and maintaining a list of the currently nondominated strings. When a new string is read, the list is scanned and every member of the list that is dominated by the new string is removed from the list. If some of the strings in the list dominates the new one the scan stops and we read the next string. At the end of the scan, if none of the strings in the list dominates the new string, it is added to the list. However, this solution timed out. Could some of you post an idea for a faster solution and running time analysis if possible?

Greetings

Posted: Sun Oct 17, 2004 1:59 pm
by marian
Hint: There are fewer strings one given string can dominate than total number of strings.

Posted: Sun Oct 17, 2004 3:15 pm
by ..
I solve it in this way:
Read all the string.
Sort the strings in some way such that if string B < string A, string B never dominates string A.
Then check for all pair (string B, string A) if string A dominates string B.

Posted: Sun Oct 17, 2004 4:13 pm
by BiK
Hi,

.. : 10x, this is beutiful. And of course you have to look at all adjacent pairs, otherwsie you'll time out.

marian: I still don't get your hint.

Posted: Sun Oct 17, 2004 5:24 pm
by marian
..: Aren't you checking O(n^2) pairs in the worst case (ie. no dominating strings)? Unless I didn't get your algorithm.

BiK: I was just saying that one string of length K can dominate at most 2^K-2 strings. Since K<=10, one string can dominate at most 1022 strings. So it's cheaper to generate all possible strings, given string can dominate and check if it is one of the given strings. If so, flag that string as non-dominating.

I used trie to represent input strings, and used recursive procedure to generate all dominated strings. So the complexity is O(N*2^K), which is better that O(N^2), because 2^K < N, in the worst case.

Posted: Sun Oct 17, 2004 5:32 pm
by ..
marian wrote:..: Aren't you checking O(n^2) pairs in the worst case (ie. no dominating strings)? Unless I didn't get your algorithm.
Yes, but enough to get AC :wink:
marian wrote: I used trie to represent input strings, and used recursive procedure to generate all dominated strings. So the complexity is O(N*2^K), which is better that O(N^2), because 2^K < N, in the worst case.
Your alogrithm is much more interesting than mine.
But how can you check if the generated string is in the input quickly?
There are totally (N*2^K) generated string, are you generate all these strings and then sort them for checking?~

Posted: Sun Oct 17, 2004 7:26 pm
by BiK
marian: Thanks for your explanation. As far as I understand, you look at the strings as multisets of characters. And in your trie you do not distinguish between strings that give rise to the same multiset. Is that correct? For example you can sort the string lexicographically. And for a given multiset you keep the indeces of the input strings that produce the given multiset. So I assume that given an input string you generate the multisets dominated by that input string and mark them in the trie as non-dominating. And at the end you traverse your trie to find the multisets that are dominating. This is how I imagine your algorithm working. Am I correct?

..: I tried your approach and it timed out. If you got accepted, I think that it is exceptional and this is not the idea of the solution. Do you use pure C? I'm currios how you managed to get under the time limit with an O(N^2) solution.

Posted: Sun Oct 17, 2004 7:51 pm
by ..
BiK wrote: ..: I tried your approach and it timed out. If you got accepted, I think that it is exceptional and this is not the idea of the solution. Do you use pure C? I'm currios how you managed to get under the time limit with an O(N^2) solution.
I used pure C. I tell more about how I optimize the algorithm.
For each string, I will store how many different kind of chars and the occurance of each kind in the string. So string A dominates string B only if number of kind in A >= number of kind in B.
Also, for string A, B, C, if I know string C is dominated by B, I will mark the string C dominated and won't check again if any other string A dominates C.

The above rules will not decrease the complexity, but it decreases the practical number of opertaions. If I remove these 2 rules of my AC code, the runtime will grow from 4s to 8s.

Posted: Sun Oct 17, 2004 8:22 pm
by marian
BiK wrote:marian: Thanks for your explanation. As far as I understand, you look at the strings as multisets of characters. And in your trie you do not distinguish between strings that give rise to the same multiset. Is that correct?
Exactly.
BiK wrote:For example you can sort the string lexicographically. And for a given multiset you keep the indeces of the input strings that produce the given multiset.
In fact, I'm keeping the opposite - for every string I keep the index of apropriate multiset. In the trie, I have indices of multisets.
BiK wrote:So I assume that given an input string you generate the multisets dominated by that input string and mark them in the trie as non-dominating. And at the end you traverse your trie to find the multisets that are dominating.
Almost, only I'm not traversing trie at the end, but flagging non-dominating multiset indices to some array.

Also, to optimize this solution a bit, I'm traversing a trie as I generate dominated strings (I have node pointer as a parameter of my recursive function). This way, I can reduce the number of generated strings, because I can stop generating, when the needed edge is not in the trie.

Marian

Posted: Sun Oct 17, 2004 11:55 pm
by tat tvam asi
to BiK
i think that the brute force
O(nn) approach mentioned by
Lawrence (aka ..) can easily get
accepted with the current input
and time limit. I used "pure" c
(no asm,register etc.) and O(nn).
csaba noszaly

Posted: Mon Oct 18, 2004 1:39 pm
by BiK
think that the brute force O(nn) approach mentioned by Lawrence (aka ..) ...
Who is Lawrence?

Posted: Mon Oct 18, 2004 2:21 pm
by ..
Me :D

Bah

Posted: Tue Oct 19, 2004 2:09 am
by Michael Goldshteyn
I, on the other hand, don't think that tries are the best data structure for this problem. Any data structure used for this problem has a construction penalty, especially once you break the 1 s barrier. I think an exhaustive search with good short circuiting heuristics goes best here combined with cheap to construct data structures.

What kept me inspired was the fact that the former first place implementation was written in PASCAL. Surely, I could not let this stand, to the embarassment of C++ programmers, everywhere :oops: .

Michael Goldshteyn
( 10745 - 0.129 s )

P.S. for those that still do not understand the problem statement, here is a simplified version:

Given strings A and B. B dominates A if BOTH conditions below are satisfied:

- Every distinct character in A also appears in B
- Any character that appears multiple times in A must appear at least that many times in B

So, a simple way of saying this is that a string B dominates a string A if A can be constructed using some or all of the characters from B.

Here are examples where the first string is dominated by the second:

abc bac (in fact, each dominates the other, so neither is a dominant string)
abc dcba
aabcc aaabccd
abccc aaabccddcc

Examples where the first string is NOT dominated by the second:

abc abdefg
abbc abcadef
aabbbcccc abbabcddcbce

Posted: Tue Oct 19, 2004 8:11 am
by Whinii F.
Okay, it seems I'm the second C++ programmer who broke the pascal implementation.. Haha :wink: (With a very close shave of 0.002 second :lol: )

What I did was inserting all vectors of alphabet frequencies into a trie. By building the trie with the frequencies (instead of the letter itself) I think you can reduce the amount of exhaustive searches you will make.

Posted: Tue Oct 19, 2004 11:15 am
by wiltord
I don't know how to implement the trie, But get AC with divide and conquer algorithm. Divide the string set to two parts, then gets the dominant strings in the two parts respectively with recursion, i call the results as candidate dominant strings of part1 and candidate dominant strings of part2. Then tests each candidate string of part1 by enumerate all the candidate strings of part2, if no candidate strings of part2 can dominate it, it is a dominant string of this set; and the same to candidate strings of part2. This method gets AC, but slow. :(