Definition and solution of a new approximate variant of the order preserving matching problem

Niquefa Velasquez, Rafael

Definition and solution of a new approximate variant of the order preserving matching problem

Archivos

msc-tesis-rafael.pdf (1.43 MB)

Autores

Niquefa Velasquez, Rafael

Tipo de contenido

Trabajo de grado - Maestría

Document language:

Español

Fecha

2017

Documentos PDF

Resumen

En esta tesis se combinan dos problemas de búsqueda de cadenas: la búsqueda aproximada de cadenas bajo parámetros δγ, y el emparejamiento con preservación de orden. Uno permite un nivel de error en la búsqueda, mientras que el otro considera la estructura interna de las cadenas en lugar de sus valores absolutos. Se define formalmente el Emparejamiento con preservación de orden bajo distancias δγ. Se diseñaron e implementaron en C++ cuatro algoritmos que resuelven el problema, y una configuración experimental para compararlos. El algoritmo más simple, tiene complejidad O(nm lg m). El segundo tiene una complejidad de O(nm). El tercero y el cuarto se basan en estructuras de datos: árbol de segmentos y árbol de fenwick respectivamente. Ambos tienen complejidad O(nm lg n). Los resultados experimentales muestran que los algoritmos basados en estructuras de datos tiene un mejor rendimiento en muchos casos. El de mejor rendimiento experimental es del basado en el árbol Fenwick, seguido por el basado en árboles de segmentos. Estos resultados se pueden explicar debido a su complejidad Ω(n lg n). Se muestran aplicaciones en música y finanzas.
Abstract: In this thesis we combine two string searching related problems: the approximate string matching under parameters δ and γ, and the order preserving matching problem. Orderpreserving matching regards the internal structure of the strings rather than their absolute values while matching under δ and γ distances permit a level of error. We formally define the δγ–order-preserving matching problem. We designed and implement in C++ four algorithms that solve the proposed problem and an experimental setup to compare them. The first algorithm is the naive algorithm with complexity Θ(nm lg m) time. The second has a complexity of Θ(nm) time. The third and four algorithms are based on the segment tree and Fenwick tree data structures, respectively, and both have O(nm log n) time complexities. The data structure based algorithms show better experimental performance due to their better lower bound of Ω(n lg n) complexity. We show applications in music and finance.

Palabras clave propuestas

Stringology; Order Preserving Pattern Matching; Approximate Matching; Delta Gamma Matching; Búsqueda de cadenas; Análisis experimental de algoritmos; Árbol de Fenwick; Árbol indexado binario; Árbol de segmentos

URI

https://repositorio.unal.edu.co/handle/unal/59919

Colecciones

Maestría en Ingeniería - Sistemas y Computación

Página completa del ítem

Definition and solution of a new approximate variant of the order preserving matching problem

Archivos

Autores

Director

Tipo de contenido

Document language:

Fecha

Título de la revista

ISSN de la revista

Título del volumen

Resumen

Abstract

Palabras clave propuestas

Descripción

Palabras clave

Citación

URI

Colecciones