Number of distinct substrings of a string [continuation]

n00m n00m at narod.ru
Sun Nov 29 09:10:46 EST 2009


On Nov 29, 3:15 pm, Bearophile <bearophileH... at lycos.com> wrote:
> Maybe in your C++ code there's something that can be improved, this is
> a 1:1 translation to D (V.1) language (using dlibs) and it's about 2.2
> times faster than the Psyco version:http://codepad.org/MQLj0ydB
> Using a smarter usage of memory (that is avoiding all or most memory
> allocations inside all loops), the performance difference will surely
> grow.

Very interesting. Thanks.
D code looks pretty neat. Btw D lang is among accepted langs there.
Even if Py by 4x *slower* -- it's still a perfect Ok for it (C# will
be
much (much) slower than Python).

Micro-optimizations.
Of course, the best optimization would be to implement Suffix Tree:
http://en.wikipedia.org/wiki/Trie
Currently I hardly understand/know/read/etc its core idea. My algo is
plainly stupid as soldier muddy boots.

My C++ code:

<BEGIN>
#include <stdlib.h>
#include <stdio.h>
//#include <string.h>
//#include <ctype.h>
//#include <math.h>
#include <time.h>
#include <iostream>
#include <vector>
#include <string>
using namespace std;

int main() {
    clock_t start_time = clock();
    freopen("88.txt", "rt", stdin);
    freopen("99.txt", "wt", stdout);

    int tcs;
    string s;
    cin >> tcs;
    while (tcs-- > 0) {
        cin >> s;
        int n = s.size();
        s = s + ' ';
        vector< vector<int> > a(128);
        int ans = 0;
        for (int i = n - 1; i >= 0; --i) {
            int lev = 0;
            for (int st = (int)a[s[i]].size() - 1; st >= 0; --st) {
                int j = a[s[i]][st];
                if (n - j <= lev) break;
                if (s[j + lev] != s[i + lev]) continue;
                if (s[j + lev / 2] != s[i + lev / 2]) continue;
                int k = 0;
                while (s[j + k] == s[i + k]) ++k;
                if (k > lev) lev = k;
            }
            a[s[i]].push_back(i);
            ans += n - i - lev;
        }
        cout << ans << endl;
    }

    cout << (clock() - start_time) / CLOCKS_PER_SEC << endl;

return 0;
}
<END>










More information about the Python-list mailing list