Results
In this article we present and analyze
\(\mathsf {eGSA}\) introduced in CPM (External memory generalized suffix and
\(\mathsf {LCP}\) arrays construction. In: Proceedings of CPM. pp 201–10,
2013)], the first external memory algorithm to construct generalized suffix arrays augmented with the longest common prefix array for a string collection. Our algorithm relies on a combination of buffers, induced sorting and a heap to avoid direct string comparisons. We performed experiments that covered different aspects of our algorithm, including running time, efficiency, external memory access, internal phases and the influence of different optimization strategies. On real datasets of size up to 24 GB and using 2 GB of internal memory,
\(\mathsf {eGSA}\) showed a competitive performance when compared to
\(\mathsf {eSAIS}\) and
\(\mathsf {SAscan}\), which are efficient algorithms for a single string according to the related literature. We also show the effect of disk caching managed by the operating system on our algorithm.