Full Description
The papers contained in this volume were presented at the 11th Conference on String Processing and Information Retrieval (SPIRE), held Oct. 5-8, 2004 at the Department of Information Engineering of the University of Padova, Italy. They wereselected from 123 paperssubmitted in responseto the call for papers. In addition, there were invited lectures by C.J. van Rijsbergen (University of Glasgow, UK) and Setsuo Arikawa (Kyushu University, Japan). In view of the large number of good-quality submissions, some were accepted this year also as short abstracts. These also appear in the proceedings. Papers solicited for SPIRE 2004 were meant to constitute original contri- tions to areas such as string pattern searching, matching and discovery; data compression; text and data mining; machine learning; tasks, methods, al- rithms, media, and evaluation in information retrieval; digital libraries; and - plications to and interactions with domains such as genome analysis,speech and naturallanguageprocessing,Web links and communities, and multilingual data. SPIRE has its origins in the South American Workshop on String Proce- ing which was ?rst held in 1993.
Starting in 1998, the focus of the symposium was broadened to include the area of information retrieval due to the common emphasisoninformationprocessing.The?rst10meetingswereheldinBeloH- izonte (Brazil, 1993), Valparaiso (Chile, 1995), Recife (Brazil, 1996), Valparaiso (Chile, 1997), Santa Cruz (Bolivia, 1998), Cancun (Mexico, 1999), A Coruna " (Spain, 2000), Laguna San Rafael (Chile, 2001), Lisbon (Portugal, 2002), and Manaus (Brazil, 2003).
Contents
Efficient One Dimensional Real Scaled Matching.- Linear Time Algorithm for the Longest Common Repeat Problem.- Automaton-Based Sublinear Keyword Pattern Matching.- Techniques for Efficient Query Expansion.- Inferring Query Performance Using Pre-retrieval Predictors.- A Scalable System for Identifying Co-derivative Documents.- Searching for a Set of Correlated Patterns.- Linear Nondeterministic Dawg String Matching Algorithm (Abstract).- Permuted and Scaled String Matching.- Bit-Parallel Branch and Bound Algorithm for Transposition Invariant LCS.- A New Feature Normalization Scheme Based on Eigenspace for Noisy Speech Recognition.- Fast Detection of Common Sequence Structure Patterns in RNAs.- An Efficient Algorithm for the Longest Tandem Scattered Subsequence Problem.- Automatic Document Categorization Based on k-NN and Object-Based Thesauri.- Indexing Text Documents Based on Topic Identification.- Cross-Comparison for Two-Dimensional Text Categorization.- DDOC: Overlapping Clustering of Words for Document Classification.- Evaluation of Web Page Representations by Content Through Clustering.- Evaluating Relevance Feedback and Display Strategies for Searching on Small Displays.- Information Extraction by Embedding HMM to the Set of Induced Linguistic Features.- Finding Cross-Lingual Spelling Variants.- An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size.- An Alphabet-Friendly FM-Index.- Concurrency Control and I/O-Optimality in Bulk Insertion.- Processing Conjunctive and Phrase Queries with the Set-Based Model.- Metric Indexing for the Vector Model in Text Retrieval.- Negations and Document Length in Logical Retrieval.- An Improvement and an Extension on the Hybrid Index for Approximate String Matching.- First Huffman, Then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index.- Metric Indexes for Approximate String Matching in a Dictionary.- Simple Implementation of String B-Trees.- Alphabet Permutation for Differentially Encoding Text.- A Space-Saving Linear-Time Algorithm for Grammar-Based Compression.- Simple, Fast, and Efficient Natural Language Adaptive Compression.- Searching XML Documents Using Relevance Propagation.- Dealing with Syntactic Variation Through a Locality-Based Approach.- Efficient Extraction of Structured Motifs Using Box-Links.- Efficient Computation of Balancedness in Binary Sequence Generators.- On Asymptotic Finite-State Error Repair.- New Algorithms for Finding Monad Patterns in DNA Sequences.- Motif Extraction from Weighted Sequences.- Longest Motifs with a Functionally Equivalent Central Block.- On the Transformation Distance Problem.- On Classification of Strings.


 
               
               
               
              


