10679 - I Love Strings!!

All about problems in Volume 106. If there is a thread about your problem, please use it. If not, create one with its number in the subject.

Moderator: Board moderators

Post Reply
erdos
New poster
Posts: 32
Joined: Sat Jul 06, 2002 9:38 pm
Location: Lisbon
Contact:

10679 - I Love Strings!!

Post by erdos » Tue Jun 29, 2004 3:12 am

I was trying to solve this problem with the naive solution (using strstr).
It gives TLE.
I've created a huge test file and the naive solution solves that huge test file very fast.
So... has the judge a lot of test files ?
I've few motivation to implement a better algorithm when I can't see the naive is slow.

By the way, if strstr is well implemented, a better solution is hard to implement considering there are only 100 queries in a single string, isn't it ?

rotoZOOM
Learning poster
Posts: 63
Joined: Mon Dec 22, 2003 5:05 am
Location: Russia, Chelyabinsk
Contact:

Re: 10679

Post by rotoZOOM » Tue Jun 29, 2004 3:44 am

All naive solution have complexity O(n*m) in worst case,
where n - lenght of base string, and m - length of substring.
What for this problem n=100000 and m=1000, O(n*m)=O(100 000 000) multiple it by number of queries and by number of test cases you get
VERY BIG complexity.
Use O(n+m) algo. For example, Knuth-Morris-Pratt.

-best,
Pavel

User avatar
cytmike
Learning poster
Posts: 95
Joined: Mon Apr 26, 2004 1:23 pm
Location: Hong Kong and United States
Contact:

Post by cytmike » Tue Jun 29, 2004 5:29 am

Is there no other ways to do so other than KMP?

rotoZOOM
Learning poster
Posts: 63
Joined: Mon Dec 22, 2003 5:05 am
Location: Russia, Chelyabinsk
Contact:

Post by rotoZOOM » Tue Jun 29, 2004 8:29 am

cytmike wrote:Is there no other ways to do so other than KMP?
I suppose you may use hash function.
But you have to select very good hash-function.

Larry
Guru
Posts: 647
Joined: Wed Jun 26, 2002 10:12 pm
Location: Hong Kong and New York City
Contact:

Post by Larry » Tue Jun 29, 2004 8:44 am

This topic is well-researched. I ended up using Turbo Boyer-Moore. I thought it was slightly unforunate that it became a "guess the algorithm" contest, because my first submission was Ukkonen's algorithm with a suffix tree, but even that got TLE.. (during the contest..)

I'm curious as to what kind of algorithm people used..

anupam
A great helper
Posts: 405
Joined: Wed Aug 28, 2002 6:45 pm
Contact:

Post by anupam » Tue Jun 29, 2004 9:12 am

Nice problem for KMP.
Thanks to Sajjad bhai :lol:
"Everything should be made simple, but not always simpler"

Adrian Kuegel
Guru
Posts: 724
Joined: Wed Dec 19, 2001 2:00 am
Location: Germany

Post by Adrian Kuegel » Tue Jun 29, 2004 9:24 am

Another nice algorithm for this problem is Aho-Corasick. But the tricky thing is that the input seems to contain in some queries duplicate patterns, so the tree has to store the same pattern more than once.

anupam
A great helper
Posts: 405
Joined: Wed Aug 28, 2002 6:45 pm
Contact:

Post by anupam » Tue Jun 29, 2004 9:28 am

Adrian, would you please describe the algorithm? it seems new to me. Or would be give me a link please??
"Everything should be made simple, but not always simpler"

Adrian Kuegel
Guru
Posts: 724
Joined: Wed Dec 19, 2001 2:00 am
Location: Germany

Post by Adrian Kuegel » Tue Jun 29, 2004 9:47 am

All patterns are stored in one tree. Then failure links are created (that have same meaning as in KMP). And then for each node, the first failure link that points to a pattern is recorded as "output link". This is necessary because some pattern may be sub-pattern of a bigger pattern and would not be found otherwise.
Example:
2 patterns: string, in
the tree looks like this:

Code: Select all

______
|     |
s(1) i(2)
|     |
t(3) n(4)
|
r(5)
|
i(6)
|
n(7)
|
g(8)
I numbered the nodes in bfs order. This order is necessary to build the failure links.
Failure link of some node is failure link of parent + last read character,
or if this is not a node in the tree, it is the failure link of node pointed to by failure link of parent + last read character ... until either a node in the tree is found or the root is reached.
So in the example, there is a failure link from 6 to 2, from 7 to 4, and from all other nodes to the root.
When we are matching a text, it is sufficient to go through the text once.
For each character in the string, we try to follow an edge in the tree. If there is no such edge, follow a failure link (like in KMP). For each reached node, follow the output link in order not to miss a string to output.
Lets search in the string "strings".
We follow the edges in the tree 0->1, 1->3, 3->5, 5->6, 6->7
now we reach output link to 4 and see that we matched the pattern "in".
then we continue with 7 -> 8. Now we have matched the pattern "string".
and then we have to take failure link, because there is no edge labeled 's'.

I think you will also find some information in the internet if you use google.

User avatar
abishek
Experienced poster
Posts: 131
Joined: Mon Dec 15, 2003 5:41 am

Post by abishek » Tue Jun 29, 2004 11:01 am

i tried rabin karp by hashing the strings modulo some prime numbers.
what ever prime number i tried, i got TLE, and now i have the code Ac in 4.5 secs.
could some one tell me how to efficiently chose the prime in rabin karp

bye
abi

mido
Learning poster
Posts: 78
Joined: Sun Jun 16, 2002 9:48 pm
Location: Cairo,Egypt

Post by mido » Tue Jun 29, 2004 4:20 pm

Uhh..our team solved this during the contest by, guess what, REVERSING the test strings, and then using strstr..the reversing process killed the tricks in the input, according to my teammate.

User avatar
sohel
Guru
Posts: 856
Joined: Thu Jan 30, 2003 5:50 am
Location: New York

poor data set

Post by sohel » Wed Jun 30, 2004 9:50 am

I applied Boyer's Moore and got AC in 0.806 seconds..

.. but after reversing and using strstr took only 0.189 s

I am waiting for a rejudge eventhough I am currently ranked 2 in this problem.
:-?

pminkov
New poster
Posts: 8
Joined: Wed Oct 17, 2001 2:00 am
Location: Bulgaria, Sofia

book

Post by pminkov » Wed Jun 30, 2004 10:59 am

If you can, check out 'Algorithms on Trees and Sequences' -
Aho-Corasick is described there in detail.

Observer
Guru
Posts: 570
Joined: Sat May 10, 2003 4:20 am
Location: Hong Kong

Re: poor data set

Post by Observer » Wed Jun 30, 2004 12:19 pm

sohel wrote:I applied Boyer's Moore and got AC in 0.806 seconds..

.. but after reversing and using strstr took only 0.189 s

I am waiting for a rejudge eventhough I am currently ranked 2 in this problem.
:-?
Well, I don't think that the "reversing" technique is so bad after all. It is not easy to come up with this idea during contest time! One has to be really smart to think of this. :D

Anyway, it would be great if I could learn the advanced algorithms you guys mentioned above. It takes time, of course :wink:
7th Contest of Newbies
Date: December 31st, 2011 (Saturday)
Time: 12:00 - 16:00 (UTC)
URL: http://uva.onlinejudge.org

User avatar
sohel
Guru
Posts: 856
Joined: Thu Jan 30, 2003 5:50 am
Location: New York

odd

Post by sohel » Wed Jun 30, 2004 2:17 pm

Something very interesting :

After reversing the strings and using strstr() took .187 seconds...

.. later I thought of a better idea.. ie reverse then apply Boyer Moore.
.. but that gave TLE..

So it seems that for some cases strstr is better than Boyer Moore.

.. can any one tell me the reason why I get TLE with the modified method.
:-?

Post Reply

Return to “Volume 106 (10600-10699)”