11107  Life Forms
Moderator: Board moderators

 New poster
 Posts: 31
 Joined: Sat Nov 17, 2001 2:00 am
 Contact:
11107  Life Forms
Does anybody know how to solve this within 16 MB? I'm using a trie, but it uses so much memory, so it exceeds the memory limit (although the actual memory limit used in waterloo is 64MB ). Thanks.
Re: 11107  Life Forms
Use a suffix array.Ilham Kurnia wrote:Does anybody know how to solve this within 16 MB? I'm using a trie, but it uses so much memory, so it exceeds the memory limit (although the actual memory limit used in waterloo is 64MB ). Thanks.
 little joey
 Guru
 Posts: 1080
 Joined: Thu Dec 19, 2002 7:37 pm
An intriguing problem. I fear it is one of those weird combinatorial design problems for which every size is a special case.little joey wrote:Just out of curiosity: does anybody know how to determine the maximum number of solutions that can be found, given the limits of the problem (and the appropriate input)?
I've been thinking about it, but I can't get my reasoning closed.
 little joey
 Guru
 Posts: 1080
 Joined: Thu Dec 19, 2002 7:37 pm
Container classes and the standard methods are like Swiss army knives: they give average solutions to average problems. For specific problems, the best solution can only be found by having a good understanding of the problem and using the appropriate tools. Imagine a carpenter creating all his furniture with his Swiss knife...
I've been thinking about the problem I posed above. It is not so hard to find an upperbound:
A line of length L can only contain Ll+1 different substrings of length l, so if you have N lines of length L, there are a maximum of (Ll+1)*N substrings available, where each different substring should occur at least N/2+1 times in separate lines. For L=1000 and N=100, as in the problem, there can only be (100*998)/51 = 1956 substrings of length 3 appearing in the answer. This is an upperbound for all substring lengths (for l=4, there are only max (100*997)/51 = 1954 substrings in the answer, and for l=2, there are only 26*26 = 676 different substrings).
A lowerbound is 998, because it is not hard to create a case with 998 substrings of length 3 in the answer:
 create a line of the form "aabaacaad..aaybbabbcbbd..bbyccaccbccd..." of length 1000, not using the letter z;
 add this line 50 times in total to the input;
 create 4 additional lines of the form "aabzabazbaaz...", just the 998 different substrings, separated by the letter z;
 the remaining 46 lines only contain the letter z.
So the 'only' remaining task is to create an input file, such that the answer consists of 1956 strings. There might be a trivial way to do this, but it could also be impossible. If it is trivial, then it should also be possible to solve it for other values of L and N, the general case.
I've been thinking about the problem I posed above. It is not so hard to find an upperbound:
A line of length L can only contain Ll+1 different substrings of length l, so if you have N lines of length L, there are a maximum of (Ll+1)*N substrings available, where each different substring should occur at least N/2+1 times in separate lines. For L=1000 and N=100, as in the problem, there can only be (100*998)/51 = 1956 substrings of length 3 appearing in the answer. This is an upperbound for all substring lengths (for l=4, there are only max (100*997)/51 = 1954 substrings in the answer, and for l=2, there are only 26*26 = 676 different substrings).
A lowerbound is 998, because it is not hard to create a case with 998 substrings of length 3 in the answer:
 create a line of the form "aabaacaad..aaybbabbcbbd..bbyccaccbccd..." of length 1000, not using the letter z;
 add this line 50 times in total to the input;
 create 4 additional lines of the form "aabzabazbaaz...", just the 998 different substrings, separated by the letter z;
 the remaining 46 lines only contain the letter z.
So the 'only' remaining task is to create an input file, such that the answer consists of 1956 strings. There might be a trivial way to do this, but it could also be impossible. If it is trivial, then it should also be possible to solve it for other values of L and N, the general case.
uhm... this is one of rare problems where i really start wondering if i should do them in PASCAL
after few optimisations, i finally got it to run fairly fast on my PC even for more complicated cases with 100 lines, each around 1000 long, and with common string about 500 long. It runs fast on judge too, about 3.5 units (whatever those units are) , but it ends up with WA ;\
I made test input set generator: randomly chose number of lifeforms N, and randomly chose 'common string'. Then for it generate each of N lines randomly in length, and in about 60% of them it insert pregenerated common string.
I tested over 100000 of such test cases, and in almost all my program found solution that included pregenerated common string, and in remaining few cases it found longer common strings.
I even suspected that i get runtime error instead of WA (even if i dont get one on my test cases), so i included error handler that should have turned runtime into TE instead of WA  but i still got WA, indicating that program finishes but judge does not like results.
One posibillity is some special case which is hard to randomly generate , but where my program has error. Another posibility is that on some test case i found longer string that judge did , like i found longer ones than my test generator made...but i doubt that ;p
So reason for this post is  if anyone have any input case that could be potentially tricky, i would appreciate if you post it here, since at this point i have no idea why i pass all my test cases, and still keep getting those WAs ;p
after few optimisations, i finally got it to run fairly fast on my PC even for more complicated cases with 100 lines, each around 1000 long, and with common string about 500 long. It runs fast on judge too, about 3.5 units (whatever those units are) , but it ends up with WA ;\
I made test input set generator: randomly chose number of lifeforms N, and randomly chose 'common string'. Then for it generate each of N lines randomly in length, and in about 60% of them it insert pregenerated common string.
I tested over 100000 of such test cases, and in almost all my program found solution that included pregenerated common string, and in remaining few cases it found longer common strings.
I even suspected that i get runtime error instead of WA (even if i dont get one on my test cases), so i included error handler that should have turned runtime into TE instead of WA  but i still got WA, indicating that program finishes but judge does not like results.
One posibillity is some special case which is hard to randomly generate , but where my program has error. Another posibility is that on some test case i found longer string that judge did , like i found longer ones than my test generator made...but i doubt that ;p
So reason for this post is  if anyone have any input case that could be potentially tricky, i would appreciate if you post it here, since at this point i have no idea why i pass all my test cases, and still keep getting those WAs ;p
BTW, just as an information related to post above, I was getting WAs due to runtime error, not because of some tricky test case ;p
Also, if anyone else is doing these in PASCAL, one reason I had timeouts was that I expected FreePascal complier to behave too much like Delphi .... to avoid using something like MyStringArray[idx1][charIdx], i defined var sn: string; sn:=MyStringArray[idx1] and later used sn[charIdx]
That approach works ok in Delphi since there variable of type string point at same memory area as original string, until original string is changed, thus s:=s1; does not result in memory copy of s1 into area reserved for s;
But it seems like FreePascal does not have such string optimisations , so if you use it (like i did) in innermost procedure , it can more than triple your time. While I was getting TE, i did several optimizations to get accepted, and only then I got idea to try and avoid that string copy, and from 8sec my time dropped to 2sec ... meaning i could have skipped those other (more complicated) optimizations ;p
Also, if anyone else is doing these in PASCAL, one reason I had timeouts was that I expected FreePascal complier to behave too much like Delphi .... to avoid using something like MyStringArray[idx1][charIdx], i defined var sn: string; sn:=MyStringArray[idx1] and later used sn[charIdx]
That approach works ok in Delphi since there variable of type string point at same memory area as original string, until original string is changed, thus s:=s1; does not result in memory copy of s1 into area reserved for s;
But it seems like FreePascal does not have such string optimisations , so if you use it (like i did) in innermost procedure , it can more than triple your time. While I was getting TE, i did several optimizations to get accepted, and only then I got idea to try and avoid that string copy, and from 8sec my time dropped to 2sec ... meaning i could have skipped those other (more complicated) optimizations ;p

 New poster
 Posts: 33
 Joined: Fri Jan 06, 2006 5:51 pm
Here's my approach:
I try all substrings of length L and insert them into a map <string, int>, which keeps track of the total occurences of a substring (I take care not to insert the same substring twice from the same string, i.e. the substring "xx" occurs in "xxx" twice, I took care of counting in only once).
Then I loop over all the entries in the map, when I find one of the substrings having number of occurences > n/2 I store it somewhere and indicate that choosing length L can lead to a valid solution.
Now I use binary search on L to find the maximum length that will produce substrings with size > n/2, then output those substrings.
I couldn't find a more efficient algorithm, but unfortunately I get a 'memory limit exceeded'. Any idea ??
I try all substrings of length L and insert them into a map <string, int>, which keeps track of the total occurences of a substring (I take care not to insert the same substring twice from the same string, i.e. the substring "xx" occurs in "xxx" twice, I took care of counting in only once).
Then I loop over all the entries in the map, when I find one of the substrings having number of occurences > n/2 I store it somewhere and indicate that choosing length L can lead to a valid solution.
Now I use binary search on L to find the maximum length that will produce substrings with size > n/2, then output those substrings.
I couldn't find a more efficient algorithm, but unfortunately I get a 'memory limit exceeded'. Any idea ??
To avoid MLE, you should not copy the substrings to other places, but just reference the pointers of their original places. And don't use STL string for this problem, but use the C style string.Khaled_912 wrote:I couldn't find a more efficient algorithm, but unfortunately I get a 'memory limit exceeded'. Any idea ??
On the other hand, your algorithm is still a little bit slow. See the answer given by Waterloo: http://plg.uwaterloo.ca/~acm00/060930/data/C.c
Another word, The max length of strings is not 1000 (which is said in the problem), but 1004.
I stay home. Don't call me out.

 Experienced poster
 Posts: 110
 Joined: Tue May 06, 2008 2:18 pm
 Location: CSEDU, Bangladesh
 Contact:
Re: 11107  Life Forms
be careful not to print duplicate strings, and also, there can be only one string.
You should not always say what you know, but you should always know what you say.
Re: 11107  Life Forms
Hi! I'm trying this problem, but have some dificulties solving it. I'm getting WA, and don't see why my algorithm is false. So if someone has some extra input for me to try and figure out what I'm doing wrong, I would very much appreciate it.
Thanks in advance.
Thanks in advance.

 Guru
 Posts: 5947
 Joined: Thu Sep 01, 2011 9:09 am
 Location: San Jose, CA, USA
Re: 11107  Life Forms
Input:AC output:
Code: Select all
4
lrbbmqb
cd
rzowkkyhidd
scd
8
jmowfrxsjybldbef
arcbynecdyggxxpklor
llnmpap
fwkhopkmcoqhnwn
uewhsqmgbbuqc
jjivswmd
q
bx
5
mvtrrbljptnsnfwzqfjm
fadrrwsof
b
nuvqhffbsaqxwpqca
ehchz
8
rkmlno
jkpqpxrjxkitzyxa
bhhki
q
oendtomfgdwdwfcgpxi
vkuytdl
gdewh
acio
12
rdtqkvwcsgs
qoqmsb
agu
nnyqxnzlgdgwp
trwblnsadeuguumoqcdr
betokyxhoachwdvmx
rdryxlmndqtukwag
lej
ukwcibxubum
nmeyatdrmydiajxlogh
q
mzhlvihjouvsuyoypayu
10
eimuotehzriic
skpggkbb
pzzrzucxaml
dfykgruow
giooobpp
eqlwphapjnadqhdcnvwd
xjbmypppha
xnspusgdhiixq
b
jxjc
6
djsuyibyebmwsiqyo
gyxymzevypzvjegeb
ocfuftsxdixti
sieehkchzdflilrjqfn
ztqrsvbspk
hsenbppkqtpddbuotbb
3
wiv
fxjujjddntgeiqvdga
jvw
5
aubwewpjvygehljxepb
iwuqzdzu
dubzvafspqpq
uzi
wovyddwyvvburc
20
gyjgfdxvtnu
neslsp
wuiupfxlzbknhkwp
anltcf
rjcddso
oyvegurfwcsfmoxe
mrjowrghwlkobmeahkg
c
aehhsveymqpxhlrnunyf
zrhb
sjeuygafo
butpnimuwfjqsjxvkqd
rxxvrwc
ds
eogvbpkxlpgdirbfcriq
fpgynkrrefxsn
ucft
wctgtwmx
upycfg
uqunublmoiitnckle
16
zbexrampe
vhqnddjeqvuygpnkaz
frpjvoaxdpcwm
obmskskfojnewxgx
nofwltwjwnnvbwjckd
eouuzhyvhgvwujbqx
pitcvograiddvhrrds
c
hkleewhxtembaqwqwpq
suebnv
gvjwdvjjafqzzxlc
dzncqgjlap
pkvxfgvicetcmkbljop
tqvvhbgsdvivhesnkqx
wrq
drvmhlubbryktheye
14
mrobde
q
rglua
i
veix
jjrqopubjguxhxdipfz
swybg
ylqvjzharv
lyauuzdrcnjk
hclffrkeecbpdi
ufhidjcxjhrnxc
mjcxohqa
xdrmgzebhnlm
pmhwdvthsfqueeexgru
2
g
kmvrzgf
13
rftwapdtutpb
tygnsrxajjngcomikjzs
wssznovdruypcnjulkfu
mxna
am
spckjcazxdrtdgy
qscczybnvqqcqcjitlvc
vbmasidz
w
aatzzwpwmf
fjkncvkelhhz
chpdnlun
ppnlgjznkewwuys
19
fonexpmmsbaop
dgzqmkq
xuvtqvnxbslqzkglzamz
dnsj
lvybwxxttqognrb
iakqlls
khfzconnmo
klp
e
snsmouwqhodsgcfohe
ysh
gxwtoayuvno
dj
tqtwkbap
iujimqwspslgvl
saqbdbgwtbseett
dnfnbyjvpdjxyuzqxst
t
zpct
20
oo
remgfkrbcvkz
gbofthgojhdnay
pnbitoraa
b
dnezw
pdawlohssvtqtkfvsy
jzlucqxswyqd
tdmfrtzlqsekej
zksklfepxchvczys
dgcxbbiswmeaylzi
ktmoik
sfxtgpo
xqiysrsqfw
djqnqcgdq
nl
uieazvmw
uufnnxvloyvgmliuqa
dlyavfauaosnln
ac
11
piumoiawcqxs
kqwgxyazntnaika
eybnuqb
qaggxachrynqxqq
lfo
pqhvokiiammqmvxj
bsoaifzyxnjc
errnmixxsyjhovengb
yqrixqgwdrygxrxkfh
cainhwilkqmbp
szd
17
znzxtzqsjwatycb
jawwmninepfdupl
c
txmkpvgrrgtuseuragel
kcapwpbqro
qawixezqkvl
bhwcocpjmrmbpbegvs
l
qtu
vkesvjtdhvtjmexfq
vufdpaxcwnwq
tbplyzedicwsodpwtqrp
uearh
gfnpaqelofrsotq
k
xipqzeqvlqmuoubb
brpmixfclbstno
1
dkujcpwsdq
16
rkiueziowoqj
iecwxxbjtnmkjgnc
mvau
gtausokbfug
tf
uqbjclvlazamucimicn
wdoxjlf
emdadgkhufs
evjaxrniv
orh
rqqwnujquo
evslqprly
krhunljgsoxleuyyfqu
ozqhmgyety
epfae
j
0
Code: Select all
d
m
b
f
s
w
?
m
p
b
d
e
q
s
i
j
v
w
b
d
u
v
w
z
f
b
e
v
r
g
a
n
z
?
?
a
i
n
q
x
y
e
p
q
dkujcpwsdq
?
Check input and AC output for thousands of problems on uDebug!
Re: 11107  Life Forms
Thanks very much... but my algorithm gives a correct output for this input.

 Guru
 Posts: 5947
 Joined: Thu Sep 01, 2011 9:09 am
 Location: San Jose, CA, USA
Re: 11107  Life Forms
Post your code.
Check input and AC output for thousands of problems on uDebug!