Mastering Regular Expressions : Powerful Techniques for Perl and Other Tools

Mastering Regular Expressions : Powerful Techniques for Perl and Other Tools

  • ただいまウェブストアではご注文を受け付けておりません。 ⇒古書を探す
  • 製本 Paperback:紙装版/ペーパーバック版/ページ数 400 p.
  • 言語 ENG,ENG
  • 商品コード 9780596002893
  • DDC分類 005.13

Full Description


Regular expressions are a powerful tool for manipulating text and data. They are now standard features in a range of languages and popular tools, including Perl, Java, VB.NET and C# (and any language using the .NET Framework), PHP, Python, Ruby, Tcl, MySQL, awk, and Emacs. This volume has been updated to include all the new features of Perl 5, 8, as well as several other languages, including Java, VB. NET, C#, Python, JavaScript, Tcl, and Ruby. It offers solutions to complex real-world problems and features information that can be put to immediate use. Topics covered include: a comparison of features among different versions of many languages and tools; how the regular expression engine works; optimization; matching just what you want, but not what you don't want; and sections and chapters on individual languages.

Table of Contents

Preface                                            xv
Introduction to Regular Expressions 1 (34)
Solving Real Problems 2 (2)
Regular Expressions as a Language 4 (2)
The Filename Analogy 4 (1)
The Language Analogy 5 (1)
The Regular-Expression Frame of Mind 6 (2)
If You Have Some Regular-Expression 6 (1)
Experience
Searching Text Files: Egrep 6 (2)
Egrep Metacharacters 8 (15)
Start and End of the Line 8 (1)
Character Classes 9 (2)
Matching Any Character with Dot 11 (2)
Alternation 13 (1)
Ignoring Differences in Capitalization 14 (1)
Word Boundaries 15 (1)
In a Nutshell 16 (1)
Optional Items 17 (1)
Other Quantifiers: Repetition 18 (2)
Parentheses and Backreferences 20 (2)
The Great Escape 22 (1)
Expanding the Foundation 23 (10)
Linguistic Diversification 23 (1)
The Goal of a Regular Expression 23 (1)
A Few More Examples 23 (4)
Regular Expression Nomenclature 27 (3)
Improving on the Status Quo 30 (2)
Summary 32 (1)
Personal Glimpses 33 (2)
Extended Introductory Examples 35 (48)
About the Examples 36 (2)
A Short Introduction to Perl 37 (1)
Matching Text with Regular Expressions 38 (12)
Toward a More Real-World Example 40 (1)
Side Effects of a Successful Match 40 (3)
Intertwined Regular Expressions 43 (6)
Intermission 49 (1)
Modifying Text with Regular Expressions 50 (33)
Example: Form Letter 50 (1)
Example: Prettifying a Stock Price 51 (2)
Automated Editing 53 (1)
A Small Mail Utility 53 (6)
Adding Commas to a Number with Lookaround 59 (8)
Text-to-HTML Conversion 67 (10)
That Doubled-Word Thing 77 (6)
Overview of Regular Expression Features and 83 (60)
Flavors
A Casual Stroll Across the Regex Landscape 85 (8)
The Origins of Regular Expressions 85 (6)
At a Glance 91 (2)
Care and Handling of Regular Expressions 93 (8)
Integrated Handling 94 (1)
Procedural and Object-Oriented Handling 95 (2)
A Search-and-Replace Example 97 (2)
Search and Replace in Other Languages 99 (2)
Care and Handling: Summary 101(1)
Strings, Character Encodings, and Modes 101(11)
Strings as Regular Expressions 101(4)
Character-Encoding Issues 105(4)
Regex Modes and Match Modes 109(3)
Common Metacharacters and Features 112(29)
Character Representations 114(3)
Character Classes and Class-Like 117(10)
Constructs
Anchors and Other ``Zero-Width 127(6)
Assertions''
Comments and Mode Modifiers 133(2)
Grouping, Capturing, Conditionals, and 135(6)
Control
Guide to the Advanced Chapters 141(2)
The Mechanics of Expression Processing 143(42)
Start Your Engines! 143(4)
Two Kinds of Engines 144(1)
New Standards 144(1)
Regex Engine Types 145(1)
From the Department of Redundancy 146(1)
Department
Testing the Engine Type 146(1)
Match Basics 147(6)
About the Examples 147(1)
Rule 1: The Match That Begins Earliest 148(1)
Wins
Engine Pieces and Parts 149(2)
Rule 2: The Standard Quantifiers Are 151(2)
Greedy
Regex-Directed Versus Text-Directed 153(4)
NFA Engine: Regex-Directed 153(2)
DFA Engine: Text-Directed 155(1)
First Thoughts: NFA and DFA in Comparison 156(1)
Backtracking 157(6)
A Really Crummy Analogy 158(1)
Two Important Points on Backtracking 159(1)
Saved States 159(3)
Backtracking and Greediness 162(1)
More About Greediness and Backtracking 163(14)
Problems of Greediness 164(1)
Multi-Character ``Quotes'' 165(1)
Using Lazy Quantifiers 166(1)
Greediness and Laziness Always Favor a 167(1)
Match
The Essence of Greediness, Laziness, and 168(1)
Backtracking
Possessive Quantifiers and Atomic Grouping 169(3)
Possessive Quantifiers, ?+, *+, ++, and 172(1)
(m,n)+
The Backtracking of Lookaround 173(1)
Is Alternation Greedy? 174(1)
Taking Advantage of Ordered Alternation 175(2)
NFA, DFA, and POSIX 177(6)
``The Longest-Leftmost'' 177(1)
POSIX and the Longest-Leftmost Rule 178(1)
Speed and Efficiency 179(1)
Summary: NFA and DFA in Comparison 180(3)
Summary 183(2)
Practical Regex Techniques 185(36)
Practical Regex Balancing Act 186(1)
A Few Short Examples 186(14)
Continuing with Continuation Lines 186(1)
Matching an IP Address 187(3)
Working with Filenames 190(3)
Matching Balanced Sets of Parentheses 193(1)
Watching Out for Unwanted Matches 194(2)
Matching Delimited Text 196(2)
Knowing Your Data and Making Assumptions 198(1)
Stripping Leading and Trailing Whitespace 199(1)
HTML-Related Examples 200(8)
Matching an HTML Tag 200(1)
Matching an HTML Link 201(2)
Examining an HTTP URL 203(1)
Validating a Hostname 203(2)
Plucking Out a URL in the Real World 205(3)
Extended Examples 208(13)
Keeping in Sync with Your Data 208(4)
Parsing CSV Files 212(9)
Crafting an Efficient Expression 221(62)
A Sobering Example 222(6)
A Simple Change-Placing Your Best Foot 223(1)
Forward
Efficiency Verses Correctness 223(2)
Advancing Further-Localizing the 225(1)
Greediness
Reality Check 226(2)
A Global View of Backtracking 228(4)
More Work for a POSIX NFA 229(1)
Work Required During a Non-Match 230(1)
Being More Specific 231(1)
Alternation Can Be Expensive 231(1)
Benchmarking 232(7)
Know What You're Measuring 234(1)
Benchmarking with Java 234(2)
Benchmarking with VB.NET 236(1)
Benchmarking with Python 237(1)
Benchmarking with Ruby 238(1)
Benchmarking with Tcl 239(1)
Common Optimizations 239(13)
No Free Lunch 240(1)
Everyone's Lunch is Different 240(1)
The Mechanics of Regex Application 241(1)
Pre-Application Optimizations 242(3)
Optimizations with the Transmission 245(2)
Optimizations of the Regex Itself 247(5)
Techniques for Faster Expressions 252(9)
Common Sense Techniques 254(1)
Expose Literal Text 255(1)
Expose Anchors 255(1)
Lazy Versus Greedy: Be Specific 256(1)
Split Into Multiple Regular Expressions 257(1)
Mimic Initial-Character Discrimination 258(1)
Use Atomic Grouping and Possessive 259(1)
Quantifiers
Lead the Engine to a Match 260(1)
Unrolling the Loop 261(16)
Method 1: Building a Regex From Past 262(1)
Experiences
The Real ``Unrolling-the-Loop'' Pattern 263(3)
Method 2: A Top-Down View 266(1)
Method 3: An Internet Hostname 267(1)
Observations 268(1)
Using Atomic Grouping and Possessive 268(2)
Quantifiers
Short Unrolling Examples 270(2)
Unrolling C Comments 272(5)
The Freeflowing Regex 277(4)
A Helping Hand to Guide the Match 277(2)
A Well-Guided Regex is a Fast Regex 279(1)
Wrapup 280(1)
In Summary: Think! 281(2)
Perl 283(82)
Regular Expressions as a Language Component 285(1)
Perl's Greatest Strength 286(1)
Perl's Greatest Weakness 286(1)
Perl's Regex Flavor 286(7)
Regex Operands and Regex Literals 288(4)
How Regex Literals Are Parsed 292(1)
Regex Modifiers 292(1)
Regex-Related Perlisms 293(10)
Expression Context 294(1)
Dynamic Scope and Regex Match Effects 295(4)
Special Variables Modified by a Match 299(4)
The qr/-/ Operator and Regex Objects 303(3)
Building and Using Regex Objects 303(2)
Viewing Regex Objects 305(1)
Using Regex Objects for Efficiency 306(1)
The Match Operator 306(12)
Match's Regex Operand 307(1)
Specifying the Match Target Operand 308(1)
Different Uses of the Match Operator 309(3)
Iterative Matching: Scalar Context, with 312(4)
/g
The Match Operator's Environmental 316(2)
Relations
The Substitution Operator 318(3)
The Replacement Operand 319(1)
The /e Modifier 319(2)
Context and Return Value 321(1)
The Split Operator 321(5)
Basic Split 322(2)
Returning Empty Elements 324(1)
Split's Special Regex Operands 325(1)
Split's Match Operand with Capturing 326(1)
Parentheses
Fun with Perl Enhancements 326(21)
Using a Dynamic Regex to Match Nested 328(3)
Pairs
Using the Embedded-Code Construct 331(4)
Using local in an Embedded-Code Construct 335(3)
A Warning About Embedded Code and my 338(2)
Variables
Matching Nested Constructs with Embedded 340(1)
Code
Overloading Regex Literals 341(3)
Problems with Regex-Literal Overloading 344(1)
Mimicking Named Capture 344(3)
Perl Efficiency Issues 347(16)
``There's More Than One Way to Do It'' 348(1)
Regex Compilation, the /o Modifier, 348(7)
qr/.../, and Efficiency
Understanding the ``Pre-Match'' Copy 355(4)
The Study Function 359(1)
Benchmarking 360(1)
Regex Debugging Information 361(2)
Final Comments 363(2)
Java 365(34)
Judging a Regex Package 366(2)
Technical Issues 366(1)
Social and Political Issues 367(1)
Object Models 368(4)
A Few Abstract Object Models 368(4)
Growing Complexity 372(1)
Packages, Packages, Packages 372(6)
Why So Many ``Perl5'' Flavors? 375(1)
Lies, Damn Lies, and Benchmarks 375(2)
Recommendations 377(1)
Sun's Regex Package 378(14)
Regex Flavor 378(3)
Using java.util.regex 381(2)
The Pattern.compile () Factory 383(1)
The Matcher Object 384(6)
Other Pattern Methods 390(2)
A Quick Look at Jakarta-ORO 392(7)
ORO's Perl5Util 392(1)
A Mini Perl5Util Reference 393(4)
Using ORO's Underlying Classes 397(2)
.NET 399(34)
.NET's Regex Flavor 400(7)
Additional Comments on the Flavor 402(5)
Using .NET Regular Expressions 407(5)
Regex Quickstart 407(2)
Package Overview 409(1)
Core Object Overview 410(2)
Core Object Details 412(13)
Creating Regex Objects 413(2)
Using Regex Objects 415(6)
Using Match Objects 421(3)
Using Group Objects 424(1)
Static ``Convenience'' Functions 425(1)
Regex Caching 426(1)
Support Functions 426(1)
Approved .NET 427(6)
Regex Assemblies 428(2)
Matching Nested Constructs 430(1)
Capture Objects 431(2)
Index 433