1 /**************************************************************
2 JLex: A Lexical Analyzer Generator for Java(TM)
3 Written by Elliot Berk <ejberk@cs.princeton.edu>. Copyright 1996.
4 Maintained by C. Scott Ananian <cananian@alumni.princeton.edu>.
5 See below for copyright notice, license, and disclaimer.
6 New releases from http://www.cs.princeton.edu/~appel/modern/java/JLex/
7
8 Version 1.2.5, 7/25/99-5/16/00, [C. Scott Ananian]
9 Stomped on one more 8-bit character bug. Should work now (really!).
10 Added unicode support, including unicode escape sequences.
11 Rewrote internal JavaLexBitSet class as SparseBitSet for efficient
12 unicoding.
13 Added an NFA character class simplification pass for unicode efficiency.
14 Changed byte- and stream-oriented I/O routines to use characters and
15 java.io.Reader and java.io.Writer instead --- which means we read in
16 unicode specifications correctly and write out a proper unicode java
17 source file. As a happy side-effect, the output java file is written
18 with your platform's preferred newline character(s).
19 Rewrote CInput to fix bugs with line-counting in the specification file
20 and "unusual behaviour" when the last line of the specification wasn't
21 terminated with a newline. Thanks to Matt Hanna <mhanna@cs.caltech.edu>
22 for pointing out the bug.
23 Fixed a bug that would cause JLex not to terminate given certain input
24 specifications. Thanks to Mark Greenstreet <mrg@cs.ubc.ca> and
25 Frank B. Brokken <frank@suffix.icce.rug.nl> for reporting this.
26 CUP parser integration improved according to suggestions made by
27 David MacMahon <davidm@smartsc.com>. The %cup directive now tells
28 JLex to generate a parser conforming to the java_cup.runtime.Scanner
29 interface; see manual for more details.
30 Fixed bug with null string literals ("") in regexps. Reported by
31 Charles Fischer <fischer@cs.wisc.edu>.
32 Rewrote start-of-line and end-of-line handling, closing active bug #5.
33 Also fixed line-counting code, closing active bug #12. All
34 new-line handling is now platform-independent.
35 Used unpackFromString more extensively to allow larger cmap, etc,
36 tables. This helps unicode support work reliably. It's also
37 prettier now if you happen to read the source to the generated
38 lexer.
39 Generated lexer now accepts unicode LS (U+2028) and PS (U+2029) as
40 line separators for strict unicode compliance; see
41 http://www.unicode.org/unicode/reports/tr18/
42 Fixed bug with character constants in action strings. Reported by
43 Andrew Appel against 1.2.5b3.
44 Fixed bug with illegal \^C-style escape sequences. Reported by
45 Toshiya Iwai <iwai@isdnet.co.jp> against 1.2.5b4.
46 Fixed "newline in quoted string" error when unpaired single- or
47 double-quotes were present in comments in the action phrase.
48 Reported by Stephen Ostermiller <1010JLex@ostermiller.com>
49 against 1.2.5b4. Reported by Eric Esposito <eric.esposito@unh.edu>
50 against 1.2.4 and 1.2.5b2.
51 Fixed "newline in quoted string" error when /* or // appeared
52 in quoted strings in the action phrase. Reported by
53 David Eichmann <david-eichmann@uiowa.edu> against 1.2.5b5.
54 Fixed 'illegal constant' errors in case statements caused by
55 Sun's JDK 1.3 more closely adhering to the Java Language
56 Specification. Reported by a number of people, but
57 Harold Grovesteen <hgrovesteen@home.com> was the first to direct me to
58 a Sun bug report (4119776) which quoted the relevant section of the
59 JLS (15.27) to convince me that the JLex construction actually was
60 illegal. Reported against 1.2.5b6, but this bit of code has been
61 present since the very first version of JLex (1.1.1).
62
63 Version 1.2.4, 7/24/99, [C. Scott Ananian]
64 Correct the parsing of '-' in character classes, closing active
65 bug #1. Behaviour follows egrep: leading and trailing dashes in
66 a character class lose their special meaning, so [-+] and [+-] do
67 what you would expect them to.
68 New %ignorecase directive for generating case-insensitive lexers by
69 expanding matched character classes in a unicode-friendly way.
70 Handle unmatched braces in quoted strings or comments within
71 action code blocks.
72 Fixed input lexer to allow whitespace in character classes, closing
73 active bug #9. Whitespace in quotes had been previously fixed.
74 Made Yylex.YYEOF and %yyeof work like the manual says they should.
75
76 Version 1.2.3, 6/26/97, [Raimondas Lencevicius]
77 Fixed the yy_nxt[][] assignment that has generated huge code
78 exceeding 64K method size limit. Now the assignment
79 is handled by unpacking a string encoding of integer array.
80 To achieve that, added
81 "private int [][] unpackFromString(int size1, int size2, String st)"
82 function and coded the yy_nxt[][] values into a string
83 by printing integers into a string and representing
84 integer sequences as "value:length" pairs.
85 Improvement: generated .java file reduced 2 times, .class file
86 reduced 6 times for sample grammar. No 64K errors.
87 Possible negatives: Some editors and OSs may not be able to handle
88 the huge one-line generated string. String unpacking may be slower
89 than direct array initialization.
90
91 Version 1.2.2, 10/24/97, [Martin Dirichs]
92 Notes:
93 Changed yy_instream to yy_reader of type BufferedReader. This reflects
94 the improvements in the JDK 1.1 concerning InputStreams. As a
95 consequence, changed yy_buffer from byte[] to char[].
96 The lexer can now be initialized with either an InputStream
97 or a Reader. A third, private constructor is called by the other
98 two to execute user specified constructor code.
99
100 Version 1.2.1, 9/15/97 [A. Appel]
101 Fixed bugs 6 (character codes > 127) and 10 (deprecated String constructor).
102
103 Version 1.2, 5/5/97, [Elliot Berk]
104 Notes:
105 Simply changed the name from JavaLex to JLex. No other changes.
106
107 Version 1.1.5, 2/25/97, [Elliot Berk]
108 Notes:
109 Simple optimization to the creation of the source files.
110 Added a BufferedOutputStream in the creation of the DataOutputStream
111 field m_outstream of the class CLexGen. This helps performance by
112 doing some buffering, and was suggested by Max Hailperin,
113 Associate Professor of Computer Science, Gustavus Adolphus College.
114
115 Version 1.1.4, 12/12/96, [Elliot Berk]
116 Notes:
117 Added %public directive to make generated class public.
118
119 Version 1.1.3, 12/11/96, [Elliot Berk]
120 Notes:
121 Converted assertion failure on invalid character class
122 when a dash '-' is not preceded with a start-of-range character.
123 Converted this into parse error E_DASH.
124
125 Version 1.1.2, October 30, 1996 [Elliot Berk]
126 Fixed BitSet bugs by installing a BitSet class of my own,
127 called JavaLexBitSet. Fixed support for '\r', non-UNIX
128 sequences. Added try/catch block around lexer generation
129 in main routine to moderate error information presented
130 to user. Fixed macro expansion, so that macros following
131 quotes are expanded correctly in regular expressions.
132 Fixed dynamic reallocation of accept action buffers.
133
134 Version 1.1.1, September 3, 1996 [Andrew Appel]
135 Made the class "Main" instead of "JavaLex",
136 improved the installation instructions to reflect this.
137
138 Version 1.1, August 15, 1996 [Andrew Appel]
139 Made yychar, yyline, yytext global to the lexer so that
140 auxiliary functions can access them.
141 **************************************************************/
142
143 /***************************************************************
144 JLEX COPYRIGHT NOTICE, LICENSE, AND DISCLAIMER
145 Copyright 1996-2000 by Elliot Joel Berk and C. Scott Ananian
146
147 Permission to use, copy, modify, and distribute this software and its
148 documentation for any purpose and without fee is hereby granted,
149 provided that the above copyright notice appear in all copies and that
150 both the copyright notice and this permission notice and warranty
151 disclaimer appear in supporting documentation, and that the name of
152 the authors or their employers not be used in advertising or publicity
153 pertaining to distribution of the software without specific, written
154 prior permission.
155
156 The authors and their employers disclaim all warranties with regard to
157 this software, including all implied warranties of merchantability and
158 fitness. In no event shall the authors or their employers be liable
159 for any special, indirect or consequential damages or any damages
160 whatsoever resulting from loss of use, data or profits, whether in an
161 action of contract, negligence or other tortious action, arising out
162 of or in connection with the use or performance of this software.
163 **************************************************************/
164
165 /***************************************************************
166 Package Declaration
167 **************************************************************/
168 package jlex;
169
170 /***************************************************************
171 Imported Packages
172 **************************************************************/
173 import java.lang.System;
174 import java.lang.Integer;
175 import java.lang.Character;
176
177 import java.util.Enumeration;
178 import java.util.Stack;
179 import java.util.Hashtable;
180 import java.util.Vector;
181
182 /******************************
183 Questions:
184 2) How should I use the Java package system
185 to make my tool more modularized and
186 coherent?
187
188 Unimplemented:
189 !) Fix BitSet issues -- expand only when necessary.
190 2) Repeated accept rules.
191 6) Clean up the CAlloc class and use buffered
192 allocation.
193 9) Add to spec about extending character set.
194 11) m_verbose -- what should be done with it?
195 12) turn lexical analyzer into a coherent
196 Java package
197 13) turn lexical analyzer generator into a
198 coherent Java package
199 16) pretty up generated code
200 17) make it possible to have white space in
201 regular expressions
202 18) clean up all of the class files the lexer
203 generator produces when it is compiled,
204 and reduce this number in some way.
205 24) character format to and from file: writeup
206 and implementation
207 25) Debug by testing all arcane regular expression cases.
208 26) Look for and fix all UNDONE comments below.
209 27) Fix package system.
210 28) Clean up unnecessary classes.
211 *****************************/
212
213 /***************************************************************
214 Class: CSpec
215 **************************************************************/
216 class CSpec
217 {
218 /***************************************************************
219 Member Variables
220 **************************************************************/
221
222 /* Lexical States. */
223 Hashtable m_states; /* Hashtable taking state indices (Integer)
224 to state name (String). */
225
226 /* Regular Expression Macros. */
227 Hashtable m_macros; /* Hashtable taking macro name (String)
228 to corresponding char buffer that
229 holds macro definition. */
230
231 /* NFA Machine. */
232 CNfa m_nfa_start; /* Start state of NFA machine. */
233 Vector m_nfa_states; /* Vector of states, with index
234 corresponding to label. */
235
236 Vector m_state_rules[]; /* An array of Vectors of Integers.
237 The ith Vector represents the lexical state
238 with index i. The contents of the ith
239 Vector are the indices of the NFA start
240 states that can be matched while in
241 the ith lexical state. */
242
243
244 int m_state_dtrans[];
245
246 /* DFA Machine. */
247 Vector m_dfa_states; /* Vector of states, with index
248 corresponding to label. */
249 Hashtable m_dfa_sets; /* Hashtable taking set of NFA states
250 to corresponding DFA state,
251 if the latter exists. */
252
253 /* Accept States and Corresponding Anchors. */
254 Vector m_accept_vector;
255 int m_anchor_array[];
256
257 /* Transition Table. */
258 Vector m_dtrans_vector;
259 int m_dtrans_ncols;
260 int m_row_map[];
261 int m_col_map[];
262
263 /* Special pseudo-characters for beginning-of-line and end-of-file. */
264 static final int NUM_PSEUDO=2;
265 int BOL; // beginning-of-line
266 int EOF; // end-of-line
267
268 /** NFA character class minimization map. */
269 int m_ccls_map[];
270
271 /* Regular expression token variables. */
272 int m_current_token;
273 char m_lexeme;
274 boolean m_in_quote;
275 boolean m_in_ccl;
276
277 /* Verbose execution flag. */
278 boolean m_verbose;
279
280 /* JLex directives flags. */
281 boolean m_integer_type;
282 boolean m_intwrap_type;
283 boolean m_yyeof;
284 boolean m_count_chars;
285 boolean m_count_lines;
286 boolean m_cup_compatible;
287 boolean m_unix;
288 boolean m_public;
289 boolean m_ignorecase;
290
291 char m_init_code[];
292 int m_init_read;
293
294 char m_init_throw_code[];
295 int m_init_throw_read;
296
297 char m_class_code[];
298 int m_class_read;
299
300 char m_eof_code[];
301 int m_eof_read;
302
303 char m_eof_value_code[];
304 int m_eof_value_read;
305
306 char m_eof_throw_code[];
307 int m_eof_throw_read;
308
309 char m_yylex_throw_code[];
310 int m_yylex_throw_read;
311
312 /* Class, function, type names. */
313 char m_class_name[] = {
314 'Y', 'y', 'l',
315 'e', 'x'
316 };
317 char m_implements_name[] = {};
318 char m_function_name[] = {
319 'y', 'y', 'l',
320 'e', 'x'
321 };
322 char m_type_name[] = {
323 'Y', 'y', 't',
324 'o', 'k', 'e',
325 'n'
326 };
327
328 /* Lexical Generator. */
329 private CLexGen m_lexGen;
330
331 /***************************************************************
332 Constants
333 ***********************************************************/
334 static final int NONE = 0;
335 static final int START = 1;
336 static final int END = 2;
337
338 /***************************************************************
339 Function: CSpec
340 Description: Constructor.
341 **************************************************************/
342 CSpec
343 (
344 CLexGen lexGen
345 )
346 {
347 m_lexGen = lexGen;
348
349 /* Initialize regular expression token variables. */
350 m_current_token = m_lexGen.EOS;
351 m_lexeme = '\0';
352 m_in_quote = false;
353 m_in_ccl = false;
354
355 /* Initialize hashtable for lexer states. */
356 m_states = new Hashtable();
357 m_states.put(new String("YYINITIAL"),new Integer(m_states.size()));
358
359 /* Initialize hashtable for lexical macros. */
360 m_macros = new Hashtable();
361
362 /* Initialize variables for lexer options. */
363 m_integer_type = false;
364 m_intwrap_type = false;
365 m_count_lines = false;
366 m_count_chars = false;
367 m_cup_compatible = false;
368 m_unix = true;
369 m_public = false;
370 m_yyeof = false;
371 m_ignorecase = false;
372
373 /* Initialize variables for JLex runtime options. */
374 m_verbose = true;
375
376 m_nfa_start = null;
377 m_nfa_states = new Vector();
378
379 m_dfa_states = new Vector();
380 m_dfa_sets = new Hashtable();
381
382 m_dtrans_vector = new Vector();
383 m_dtrans_ncols = CUtility.MAX_SEVEN_BIT + 1;
384 m_row_map = null;
385 m_col_map = null;
386
387 m_accept_vector = null;
388 m_anchor_array = null;
389
390 m_init_code = null;
391 m_init_read = 0;
392
393 m_init_throw_code = null;
394 m_init_throw_read = 0;
395
396 m_yylex_throw_code = null;
397 m_yylex_throw_read = 0;
398
399 m_class_code = null;
400 m_class_read = 0;
401
402 m_eof_code = null;
403 m_eof_read = 0;
404
405 m_eof_value_code = null;
406 m_eof_value_read = 0;
407
408 m_eof_throw_code = null;
409 m_eof_throw_read = 0;
410
411 m_state_dtrans = null;
412
413 m_state_rules = null;
414 }
415 }
416
417 /***************************************************************
418 Class: CEmit
419 **************************************************************/
420 class CEmit
421 {
422 /***************************************************************
423 Member Variables
424 **************************************************************/
425 private CSpec m_spec;
426 private java.io.PrintWriter m_outstream;
427
428 /***************************************************************
429 Constants: Anchor Types
430 **************************************************************/
431 private final int START = 1;
432 private final int END = 2;
433 private final int NONE = 4;
434
435 /***************************************************************
436 Constants
437 **************************************************************/
438 private final boolean EDBG = true;
439 private final boolean NOT_EDBG = false;
440
441 /***************************************************************
442 Function: CEmit
443 Description: Constructor.
444 **************************************************************/
445 CEmit
446 (
447 )
448 {
449 reset();
450 }
451
452 /***************************************************************
453 Function: reset
454 Description: Clears member variables.
455 **************************************************************/
456 private void reset
457 (
458 )
459 {
460 m_spec = null;
461 m_outstream = null;
462 }
463
464 /***************************************************************
465 Function: set
466 Description: Initializes member variables.
467 **************************************************************/
468 private void set
469 (
470 CSpec spec,
471 java.io.PrintWriter outstream
472 )
473 {
474 if (CUtility.DEBUG)
475 {
476 CUtility.assert(null != spec);
477 CUtility.assert(null != outstream);
478 }
479
480 m_spec = spec;
481 m_outstream = outstream;
482 }
483
484 /***************************************************************
485 Function: emit_imports
486 Description: Emits import packages at top of
487 generated source file.
488 **************************************************************/
489 /*void emit_imports
490 (
491 CSpec spec,
492 OutputStream outstream
493 )
494 throws java.io.IOException
495 {
496 set(spec,outstream);
497
498 if (CUtility.DEBUG)
499 {
500 CUtility.assert(null != m_spec);
501 CUtility.assert(null != m_outstream);
502 }*/
503
504 /*m_outstream.println("import java.lang.String;");
505 m_outstream.println("import java.lang.System;");
506 m_outstream.println("import java.io.BufferedReader;");
507 m_outstream.println("import java.io.InputStream;");*/
508 /*
509 reset();
510 }*/
511
512 /***************************************************************
513 Function: print_details
514 Description: Debugging output.
515 **************************************************************/
516 private void print_details
517 (
518 )
519 {
520 int i;
521 int j;
522 int next;
523 int state;
524 CDTrans dtrans;
525 CAccept accept;
526 boolean tr;
527
528 System.out.println("---------------------- Transition Table "
529 + "----------------------");
530
531 for (i = 0; i < m_spec.m_row_map.length; ++i)
532 {
533 System.out.print("State " + i);
534
535 accept = (CAccept) m_spec.m_accept_vector.elementAt(i);
536 if (null == accept)
537 {
538 System.out.println(" [nonaccepting]");
539 }
540 else
541 {
542 System.out.println(" [accepting, line "
543 + accept.m_line_number
544 + " <"
545 + (new java.lang.String(accept.m_action,0,
546 accept.m_action_read))
547 + ">]");
548 }
549 dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(m_spec.m_row_map[i]);
550
551 tr = false;
552 state = dtrans.m_dtrans[m_spec.m_col_map[0]];
553 if (CDTrans.F != state)
554 {
555 tr = true;
556 System.out.print("\tgoto " + state + " on [" + ((char) 0));
557 }
558 for (j = 1; j < m_spec.m_dtrans_ncols; ++j)
559 {
560 next = dtrans.m_dtrans[m_spec.m_col_map[j]];
561 if (state == next)
562 {
563 if (CDTrans.F != state)
564 {
565 System.out.print((char) j);
566 }
567 }
568 else
569 {
570 state = next;
571 if (tr)
572 {
573 System.out.println("]");
574 tr = false;
575 }
576 if (CDTrans.F != state)
577 {
578 tr = true;
579 System.out.print("\tgoto " + state + " on [" + ((char) j));
580 }
581 }
582 }
583 if (tr)
584 {
585 System.out.println("]");
586 }
587 }
588
589 System.out.println("---------------------- Transition Table "
590 + "----------------------");
591 }
592
593 /***************************************************************
594 Function: emit
595 Description: High-level access function to module.
596 **************************************************************/
597 void emit
598 (
599 CSpec spec,
600 java.io.PrintWriter outstream
601 )
602 throws java.io.IOException
603 {
604 set(spec,outstream);
605
606 if (CUtility.DEBUG)
607 {
608 CUtility.assert(null != m_spec);
609 CUtility.assert(null != m_outstream);
610 }
611
612 if (CUtility.OLD_DEBUG) {
613 print_details();
614 }
615
616 emit_header();
617 emit_construct();
618 emit_helpers();
619 emit_driver();
620 emit_footer();
621
622 reset();
623 }
624
625 /***************************************************************
626 Function: emit_construct
627 Description: Emits constructor, member variables,
628 and constants.
629 **************************************************************/
630 private void emit_construct
631 (
632 )
633 throws java.io.IOException
634 {
635 if (CUtility.DEBUG)
636 {
637 CUtility.assert(null != m_spec);
638 CUtility.assert(null != m_outstream);
639 }
640
641 /* Constants */
642 m_outstream.println("\tprivate final int YY_BUFFER_SIZE = 512;");
643
644 m_outstream.println("\tprivate final int YY_F = -1;");
645 m_outstream.println("\tprivate final int YY_NO_STATE = -1;");
646
647 m_outstream.println("\tprivate final int YY_NOT_ACCEPT = 0;");
648 m_outstream.println("\tprivate final int YY_START = 1;");
649 m_outstream.println("\tprivate final int YY_END = 2;");
650 m_outstream.println("\tprivate final int YY_NO_ANCHOR = 4;");
651
652 // internal
653 m_outstream.println("\tprivate final int YY_BOL = "+m_spec.BOL+";");
654 m_outstream.println("\tprivate final int YY_EOF = "+m_spec.EOF+";");
655 // external
656 if (m_spec.m_integer_type || true == m_spec.m_yyeof)
657 m_outstream.println("\tpublic final int YYEOF = -1;");
658
659 /* User specified class code. */
660 if (null != m_spec.m_class_code)
661 {
662 m_outstream.print(new String(m_spec.m_class_code,0,
663 m_spec.m_class_read));
664 }
665
666 /* Member Variables */
667 m_outstream.println("\tprivate java.io.BufferedReader yy_reader;");
668 m_outstream.println("\tprivate int yy_buffer_index;");
669 m_outstream.println("\tprivate int yy_buffer_read;");
670 m_outstream.println("\tprivate int yy_buffer_start;");
671 m_outstream.println("\tprivate int yy_buffer_end;");
672 m_outstream.println("\tprivate char yy_buffer[];");
673 if (m_spec.m_count_chars)
674 {
675 m_outstream.println("\tprivate int yychar;");
676 }
677 if (m_spec.m_count_lines)
678 {
679 m_outstream.println("\tprivate int yyline;");
680 }
681 m_outstream.println("\tprivate boolean yy_at_bol;");
682 m_outstream.println("\tprivate int yy_lexical_state;");
683 /*if (m_spec.m_count_lines || true == m_spec.m_count_chars)
684 {
685 m_outstream.println("\tprivate int yy_buffer_prev_start;");
686 }*/
687 m_outstream.println();
688
689
690 /* Function: first constructor (Reader) */
691 m_outstream.print("\t");
692 if (true == m_spec.m_public) {
693 m_outstream.print("public ");
694 }
695 m_outstream.print(new String(m_spec.m_class_name));
696 m_outstream.print(" (java.io.Reader reader)");
697
698 if (null != m_spec.m_init_throw_code)
699 {
700 m_outstream.println();
701 m_outstream.print("\t\tthrows ");
702 m_outstream.print(new String(m_spec.m_init_throw_code,0,
703 m_spec.m_init_throw_read));
704 m_outstream.println();
705 m_outstream.println("\t\t{");
706 }
707 else
708 {
709 m_outstream.println(" {");
710 }
711
712 m_outstream.println("\t\tthis ();");
713 m_outstream.println("\t\tif (null == reader) {");
714 m_outstream.println("\t\t\tthrow (new Error(\"Error: Bad input "
715 + "stream initializer.\"));");
716 m_outstream.println("\t\t}");
717 m_outstream.println("\t\tyy_reader = new java.io.BufferedReader(reader);");
718 m_outstream.println("\t}");
719 m_outstream.println();
720
721
722 /* Function: second constructor (InputStream) */
723 m_outstream.print("\t");
724 if (true == m_spec.m_public) {
725 m_outstream.print("public ");
726 }
727 m_outstream.print(new String(m_spec.m_class_name));
728 m_outstream.print(" (java.io.InputStream instream)");
729
730 if (null != m_spec.m_init_throw_code)
731 {
732 m_outstream.println();
733 m_outstream.print("\t\tthrows ");
734 m_outstream.println(new String(m_spec.m_init_throw_code,0,
735 m_spec.m_init_throw_read));
736 m_outstream.println("\t\t{");
737 }
738 else
739 {
740 m_outstream.println(" {");
741 }
742
743 m_outstream.println("\t\tthis ();");
744 m_outstream.println("\t\tif (null == instream) {");
745 m_outstream.println("\t\t\tthrow (new Error(\"Error: Bad input "
746 + "stream initializer.\"));");
747 m_outstream.println("\t\t}");
748 m_outstream.println("\t\tyy_reader = new java.io.BufferedReader(new java.io.InputStreamReader(instream));");
749 m_outstream.println("\t}");
750 m_outstream.println();
751
752
753 /* Function: third, private constructor - only for internal use */
754 m_outstream.print("\tprivate ");
755 m_outstream.print(new String(m_spec.m_class_name));
756 m_outstream.print(" ()");
757
758 if (null != m_spec.m_init_throw_code)
759 {
760 m_outstream.println();
761 m_outstream.print("\t\tthrows ");
762 m_outstream.println(new String(m_spec.m_init_throw_code,0,
763 m_spec.m_init_throw_read));
764 m_outstream.println("\t\t{");
765 }
766 else
767 {
768 m_outstream.println(" {");
769 }
770
771 m_outstream.println("\t\tyy_buffer = new char[YY_BUFFER_SIZE];");
772 m_outstream.println("\t\tyy_buffer_read = 0;");
773 m_outstream.println("\t\tyy_buffer_index = 0;");
774 m_outstream.println("\t\tyy_buffer_start = 0;");
775 m_outstream.println("\t\tyy_buffer_end = 0;");
776 if (m_spec.m_count_chars)
777 {
778 m_outstream.println("\t\tyychar = 0;");
779 }
780 if (m_spec.m_count_lines)
781 {
782 m_outstream.println("\t\tyyline = 0;");
783 }
784 m_outstream.println("\t\tyy_at_bol = true;");
785 m_outstream.println("\t\tyy_lexical_state = YYINITIAL;");
786 /*if (m_spec.m_count_lines || true == m_spec.m_count_chars)
787 {
788 m_outstream.println("\t\tyy_buffer_prev_start = 0;");
789 }*/
790
791 /* User specified constructor code. */
792 if (null != m_spec.m_init_code)
793 {
794 m_outstream.print(new String(m_spec.m_init_code,0,
795 m_spec.m_init_read));
796 }
797
798 m_outstream.println("\t}");
799 m_outstream.println();
800
801 }
802
803 /***************************************************************
804 Function: emit_states
805 Description: Emits constants that serve as lexical states,
806 including YYINITIAL.
807 **************************************************************/
808 private void emit_states
809 (
810 )
811 throws java.io.IOException
812 {
813 Enumeration states;
814 String state;
815 int index;
816
817 states = m_spec.m_states.keys();
818 /*index = 0;*/
819 while (states.hasMoreElements())
820 {
821 state = (String) states.nextElement();
822
823 if (CUtility.DEBUG)
824 {
825 CUtility.assert(null != state);
826 }
827
828 m_outstream.println("\tprivate final int "
829 + state
830 + " = "
831 + (m_spec.m_states.get(state)).toString()
832 + ";");
833 /*++index;*/
834 }
835
836 m_outstream.println("\tprivate final int yy_state_dtrans[] = {");
837 for (index = 0; index < m_spec.m_state_dtrans.length; ++index)
838 {
839 m_outstream.print("\t\t" + m_spec.m_state_dtrans[index]);
840 if (index < m_spec.m_state_dtrans.length - 1)
841 {
842 m_outstream.println(",");
843 }
844 else
845 {
846 m_outstream.println();
847 }
848 }
849 m_outstream.println("\t};");
850 }
851
852 /***************************************************************
853 Function: emit_helpers
854 Description: Emits helper functions, particularly
855 error handling and input buffering.
856 **************************************************************/
857 private void emit_helpers
858 (
859 )
860 throws java.io.IOException
861 {
862 if (CUtility.DEBUG)
863 {
864 CUtility.assert(null != m_spec);
865 CUtility.assert(null != m_outstream);
866 }
867
868 /* Function: yy_do_eof */
869 m_outstream.println("\tprivate boolean yy_eof_done = false;");
870 if (null != m_spec.m_eof_code)
871 {
872 m_outstream.print("\tprivate void yy_do_eof ()");
873
874 if (null != m_spec.m_eof_throw_code)
875 {
876 m_outstream.println();
877 m_outstream.print("\t\tthrows ");
878 m_outstream.println(new String(m_spec.m_eof_throw_code,0,
879 m_spec.m_eof_throw_read));
880 m_outstream.println("\t\t{");
881 }
882 else
883 {
884 m_outstream.println(" {");
885 }
886
887 m_outstream.println("\t\tif (false == yy_eof_done) {");
888 m_outstream.print(new String(m_spec.m_eof_code,0,
889 m_spec.m_eof_read));
890 m_outstream.println("\t\t}");
891 m_outstream.println("\t\tyy_eof_done = true;");
892 m_outstream.println("\t}");
893 }
894
895 emit_states();
896
897 /* Function: yybegin */
898 m_outstream.println("\tprivate void yybegin (int state) {");
899 m_outstream.println("\t\tyy_lexical_state = state;");
900 m_outstream.println("\t}");
901
902 /* Function: yy_initial_dtrans */
903 /*m_outstream.println("\tprivate int yy_initial_dtrans (int state) {");
904 m_outstream.println("\t\treturn yy_state_dtrans[state];");
905 m_outstream.println("\t}");*/
906
907 /* Function: yy_advance */
908 m_outstream.println("\tprivate int yy_advance ()");
909 m_outstream.println("\t\tthrows java.io.IOException {");
910 /*m_outstream.println("\t\t{");*/
911 m_outstream.println("\t\tint next_read;");
912 m_outstream.println("\t\tint i;");
913 m_outstream.println("\t\tint j;");
914 m_outstream.println();
915
916 m_outstream.println("\t\tif (yy_buffer_index < yy_buffer_read) {");
917 m_outstream.println("\t\t\treturn yy_buffer[yy_buffer_index++];");
918 /*m_outstream.println("\t\t\t++yy_buffer_index;");*/
919 m_outstream.println("\t\t}");
920 m_outstream.println();
921
922 m_outstream.println("\t\tif (0 != yy_buffer_start) {");
923 m_outstream.println("\t\t\ti = yy_buffer_start;");
924 m_outstream.println("\t\t\tj = 0;");
925 m_outstream.println("\t\t\twhile (i < yy_buffer_read) {");
926 m_outstream.println("\t\t\t\tyy_buffer[j] = yy_buffer[i];");
927 m_outstream.println("\t\t\t\t++i;");
928 m_outstream.println("\t\t\t\t++j;");
929 m_outstream.println("\t\t\t}");
930 m_outstream.println("\t\t\tyy_buffer_end = yy_buffer_end - yy_buffer_start;");
931 m_outstream.println("\t\t\tyy_buffer_start = 0;");
932 m_outstream.println("\t\t\tyy_buffer_read = j;");
933 m_outstream.println("\t\t\tyy_buffer_index = j;");
934 m_outstream.println("\t\t\tnext_read = yy_reader.read(yy_buffer,");
935 m_outstream.println("\t\t\t\t\tyy_buffer_read,");
936 m_outstream.println("\t\t\t\t\tyy_buffer.length - yy_buffer_read);");
937 m_outstream.println("\t\t\tif (-1 == next_read) {");
938 m_outstream.println("\t\t\t\treturn YY_EOF;");
939 m_outstream.println("\t\t\t}");
940 m_outstream.println("\t\t\tyy_buffer_read = yy_buffer_read + next_read;");
941 m_outstream.println("\t\t}");
942 m_outstream.println();
943
944 m_outstream.println("\t\twhile (yy_buffer_index >= yy_buffer_read) {");
945 m_outstream.println("\t\t\tif (yy_buffer_index >= yy_buffer.length) {");
946 m_outstream.println("\t\t\t\tyy_buffer = yy_double(yy_buffer);");
947 m_outstream.println("\t\t\t}");
948 m_outstream.println("\t\t\tnext_read = yy_reader.read(yy_buffer,");
949 m_outstream.println("\t\t\t\t\tyy_buffer_read,");
950 m_outstream.println("\t\t\t\t\tyy_buffer.length - yy_buffer_read);");
951 m_outstream.println("\t\t\tif (-1 == next_read) {");
952 m_outstream.println("\t\t\t\treturn YY_EOF;");
953 m_outstream.println("\t\t\t}");
954 m_outstream.println("\t\t\tyy_buffer_read = yy_buffer_read + next_read;");
955 m_outstream.println("\t\t}");
956
957 m_outstream.println("\t\treturn yy_buffer[yy_buffer_index++];");
958 m_outstream.println("\t}");
959
960 /* Function: yy_move_end */
961 m_outstream.println("\tprivate void yy_move_end () {");
962 m_outstream.println("\t\tif (yy_buffer_end > yy_buffer_start &&");
963 m_outstream.println("\t\t '\\n' == yy_buffer[yy_buffer_end-1])");
964 m_outstream.println("\t\t\tyy_buffer_end--;");
965 m_outstream.println("\t\tif (yy_buffer_end > yy_buffer_start &&");
966 m_outstream.println("\t\t '\\r' == yy_buffer[yy_buffer_end-1])");
967 m_outstream.println("\t\t\tyy_buffer_end--;");
968 m_outstream.println("\t}");
969
970 /* Function: yy_mark_start */
971 m_outstream.println("\tprivate boolean yy_last_was_cr=false;");
972 m_outstream.println("\tprivate void yy_mark_start () {");
973 if (m_spec.m_count_lines || true == m_spec.m_count_chars)
974 {
975 if (m_spec.m_count_lines)
976 {
977 m_outstream.println("\t\tint i;");
978 m_outstream.println("\t\tfor (i = yy_buffer_start; "
979 + "i < yy_buffer_index; ++i) {");
980 m_outstream.println("\t\t\tif ('\\n' == yy_buffer[i] && !yy_last_was_cr) {");
981 m_outstream.println("\t\t\t\t++yyline;");
982 m_outstream.println("\t\t\t}");
983 m_outstream.println("\t\t\tif ('\\r' == yy_buffer[i]) {");
984 m_outstream.println("\t\t\t\t++yyline;");
985 m_outstream.println("\t\t\t\tyy_last_was_cr=true;");
986 m_outstream.println("\t\t\t} else yy_last_was_cr=false;");
987 m_outstream.println("\t\t}");
988 }
989 if (m_spec.m_count_chars)
990 {
991 m_outstream.println("\t\tyychar = yychar");
992 m_outstream.println("\t\t\t+ yy_buffer_index - yy_buffer_start;");
993 }
994 }
995 m_outstream.println("\t\tyy_buffer_start = yy_buffer_index;");
996 m_outstream.println("\t}");
997
998 /* Function: yy_mark_end */
999 m_outstream.println("\tprivate void yy_mark_end () {");
1000 m_outstream.println("\t\tyy_buffer_end = yy_buffer_index;");
1001 m_outstream.println("\t}");
1002
1003 /* Function: yy_to_mark */
1004 m_outstream.println("\tprivate void yy_to_mark () {");
1005 m_outstream.println("\t\tyy_buffer_index = yy_buffer_end;");
1006 m_outstream.println("\t\tyy_at_bol = "+
1007 "(yy_buffer_end > yy_buffer_start) &&");
1008 m_outstream.println("\t\t "+
1009 "('\\r' == yy_buffer[yy_buffer_end-1] ||");
1010 m_outstream.println("\t\t "+
1011 " '\\n' == yy_buffer[yy_buffer_end-1] ||");
1012 m_outstream.println("\t\t "+ /* unicode LS */
1013 " 2028/*LS*/ == yy_buffer[yy_buffer_end-1] ||");
1014 m_outstream.println("\t\t "+ /* unicode PS */
1015 " 2029/*PS*/ == yy_buffer[yy_buffer_end-1]);");
1016 m_outstream.println("\t}");
1017
1018 /* Function: yytext */
1019 m_outstream.println("\tprivate java.lang.String yytext () {");
1020 m_outstream.println("\t\treturn (new java.lang.String(yy_buffer,");
1021 m_outstream.println("\t\t\tyy_buffer_start,");
1022 m_outstream.println("\t\t\tyy_buffer_end - yy_buffer_start));");
1023 m_outstream.println("\t}");
1024
1025 /* Function: yylength */
1026 m_outstream.println("\tprivate int yylength () {");
1027 m_outstream.println("\t\treturn yy_buffer_end - yy_buffer_start;");
1028 m_outstream.println("\t}");
1029
1030 /* Function: yy_double */
1031 m_outstream.println("\tprivate char[] yy_double (char buf[]) {");
1032 m_outstream.println("\t\tint i;");
1033 m_outstream.println("\t\tchar newbuf[];");
1034 m_outstream.println("\t\tnewbuf = new char[2*buf.length];");
1035 m_outstream.println("\t\tfor (i = 0; i < buf.length; ++i) {");
1036 m_outstream.println("\t\t\tnewbuf[i] = buf[i];");
1037 m_outstream.println("\t\t}");
1038 m_outstream.println("\t\treturn newbuf;");
1039 m_outstream.println("\t}");
1040
1041 /* Function: yy_error */
1042 m_outstream.println("\tprivate final int YY_E_INTERNAL = 0;");
1043 m_outstream.println("\tprivate final int YY_E_MATCH = 1;");
1044 m_outstream.println("\tprivate java.lang.String yy_error_string[] = {");
1045 m_outstream.println("\t\t\"Error: Internal error.\\n\",");
1046 m_outstream.println("\t\t\"Error: Unmatched input.\\n\"");
1047 m_outstream.println("\t};");
1048 m_outstream.println("\tprivate void yy_error (int code,boolean fatal) {");
1049 m_outstream.println("\t\tjava.lang.System.out.print(yy_error_string[code]);");
1050 m_outstream.println("\t\tjava.lang.System.out.flush();");
1051 m_outstream.println("\t\tif (fatal) {");
1052 m_outstream.println("\t\t\tthrow new Error(\"Fatal Error.\\n\");");
1053 m_outstream.println("\t\t}");
1054 m_outstream.println("\t}");
1055
1056 /* Function: yy_next */
1057 /*m_outstream.println("\tprivate int yy_next (int current,char lookahead) {");
1058 m_outstream.println("\t\treturn yy_nxt[yy_rmap[current]][yy_cmap[lookahead]];");
1059 m_outstream.println("\t}");*/
1060
1061 /* Function: yy_accept */
1062 /*m_outstream.println("\tprivate int yy_accept (int current) {");
1063 m_outstream.println("\t\treturn yy_acpt[current];");
1064 m_outstream.println("\t}");*/
1065
1066
1067 // Function: private int [][] unpackFromString(int size1, int size2, String st)
1068 // Added 6/24/98 Raimondas Lencevicius
1069 // May be made more efficient by replacing String operations
1070 // Assumes correctly formed input String. Performs no error checking
1071 m_outstream.println("\tprivate int[][] unpackFromString"+
1072 "(int size1, int size2, String st) {");
1073 m_outstream.println("\t\tint colonIndex = -1;");
1074 m_outstream.println("\t\tString lengthString;");
1075 m_outstream.println("\t\tint sequenceLength = 0;");
1076 m_outstream.println("\t\tint sequenceInteger = 0;");
1077 m_outstream.println();
1078 m_outstream.println("\t\tint commaIndex;");
1079 m_outstream.println("\t\tString workString;");
1080 m_outstream.println();
1081 m_outstream.println("\t\tint res[][] = new int[size1][size2];");
1082 m_outstream.println("\t\tfor (int i= 0; i < size1; i++) {");
1083 m_outstream.println("\t\t\tfor (int j= 0; j < size2; j++) {");
1084 m_outstream.println("\t\t\t\tif (sequenceLength != 0) {");
1085 m_outstream.println("\t\t\t\t\tres[i][j] = sequenceInteger;");
1086 m_outstream.println("\t\t\t\t\tsequenceLength--;");
1087 m_outstream.println("\t\t\t\t\tcontinue;");
1088 m_outstream.println("\t\t\t\t}");
1089 m_outstream.println("\t\t\t\tcommaIndex = st.indexOf(',');");
1090 m_outstream.println("\t\t\t\tworkString = (commaIndex==-1) ? st :");
1091 m_outstream.println("\t\t\t\t\tst.substring(0, commaIndex);");
1092 m_outstream.println("\t\t\t\tst = st.substring(commaIndex+1);");
1093 m_outstream.println("\t\t\t\tcolonIndex = workString.indexOf(':');");
1094 m_outstream.println("\t\t\t\tif (colonIndex == -1) {");
1095 m_outstream.println("\t\t\t\t\tres[i][j]=Integer.parseInt(workString);");
1096 m_outstream.println("\t\t\t\t\tcontinue;");
1097 m_outstream.println("\t\t\t\t}");
1098 m_outstream.println("\t\t\t\tlengthString =");
1099 m_outstream.println("\t\t\t\t\tworkString.substring(colonIndex+1);");
1100 m_outstream.println("\t\t\t\tsequenceLength="+
1101 "Integer.parseInt(lengthString);");
1102 m_outstream.println("\t\t\t\tworkString="+
1103 "workString.substring(0,colonIndex);");
1104 m_outstream.println("\t\t\t\tsequenceInteger="+
1105 "Integer.parseInt(workString);");
1106 m_outstream.println("\t\t\t\tres[i][j] = sequenceInteger;");
1107 m_outstream.println("\t\t\t\tsequenceLength--;");
1108 m_outstream.println("\t\t\t}");
1109 m_outstream.println("\t\t}");
1110 m_outstream.println("\t\treturn res;");
1111 m_outstream.println("\t}");
1112 }
1113
1114 /***************************************************************
1115 Function: emit_header
1116 Description: Emits class header.
1117 **************************************************************/
1118 private void emit_header
1119 (
1120 )
1121 throws java.io.IOException
1122 {
1123 if (CUtility.DEBUG)
1124 {
1125 CUtility.assert(null != m_spec);
1126 CUtility.assert(null != m_outstream);
1127 }
1128
1129 m_outstream.println();
1130 m_outstream.println();
1131 if (true == m_spec.m_public) {
1132 m_outstream.print("public ");
1133 }
1134 m_outstream.print("class ");
1135 m_outstream.print(new String(m_spec.m_class_name,0,
1136 m_spec.m_class_name.length));
1137 if (m_spec.m_implements_name.length > 0) {
1138 m_outstream.print(" implements ");
1139 m_outstream.print(new String(m_spec.m_implements_name,0,
1140 m_spec.m_implements_name.length));
1141 }
1142 m_outstream.println(" {");
1143 }
1144
1145 /***************************************************************
1146 Function: emit_table
1147 Description: Emits transition table.
1148 **************************************************************/
1149 private void emit_table
1150 (
1151 )
1152 throws java.io.IOException
1153 {
1154 int i;
1155 int elem;
1156 int size;
1157 CDTrans dtrans;
1158 boolean is_start;
1159 boolean is_end;
1160 CAccept accept;
1161
1162 if (CUtility.DEBUG)
1163 {
1164 CUtility.assert(null != m_spec);
1165 CUtility.assert(null != m_outstream);
1166 }
1167
1168 m_outstream.println("\tprivate int yy_acpt[] = {");
1169 size = m_spec.m_accept_vector.size();
1170 for (elem = 0; elem < size; ++elem)
1171 {
1172 accept = (CAccept) m_spec.m_accept_vector.elementAt(elem);
1173
1174 m_outstream.print("\t\t/* "+elem+" */ ");
1175 if (null != accept)
1176 {
1177 is_start = (0 != (m_spec.m_anchor_array[elem] & CSpec.START));
1178 is_end = (0 != (m_spec.m_anchor_array[elem] & CSpec.END));
1179
1180 if (is_start && true == is_end)
1181 {
1182 m_outstream.print("YY_START | YY_END");
1183 }
1184 else if (is_start)
1185 {
1186 m_outstream.print("YY_START");
1187 }
1188 else if (is_end)
1189 {
1190 m_outstream.print("YY_END");
1191 }
1192 else
1193 {
1194 m_outstream.print("YY_NO_ANCHOR");
1195 }
1196 }
1197 else
1198 {
1199 m_outstream.print("YY_NOT_ACCEPT");
1200 }
1201
1202 if (elem < size - 1)
1203 {
1204 m_outstream.print(",");
1205 }
1206
1207 m_outstream.println();
1208 }
1209 m_outstream.println("\t};");
1210
1211 // CSA: modified yy_cmap to use string packing 9-Aug-1999
1212 int[] yy_cmap = new int[m_spec.m_ccls_map.length];
1213 for (i = 0; i < m_spec.m_ccls_map.length; ++i)
1214 yy_cmap[i] = m_spec.m_col_map[m_spec.m_ccls_map[i]];
1215 m_outstream.print("\tprivate int yy_cmap[] = unpackFromString(");
1216 emit_table_as_string(new int[][] { yy_cmap });
1217 m_outstream.println(")[0];");
1218 m_outstream.println();
1219
1220 // CSA: modified yy_rmap to use string packing 9-Aug-1999
1221 m_outstream.print("\tprivate int yy_rmap[] = unpackFromString(");
1222 emit_table_as_string(new int[][] { m_spec.m_row_map });
1223 m_outstream.println(")[0];");
1224 m_outstream.println();
1225
1226 // 6/24/98 Raimondas Lencevicius
1227 // modified to use
1228 // int[][] unpackFromString(int size1, int size2, String st)
1229 size = m_spec.m_dtrans_vector.size();
1230 int[][] yy_nxt = new int[size][];
1231 for (elem=0; elem<size; elem++) {
1232 dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(elem);
1233 CUtility.assert(dtrans.m_dtrans.length==m_spec.m_dtrans_ncols);
1234 yy_nxt[elem] = dtrans.m_dtrans;
1235 }
1236 m_outstream.print
1237 ("\tprivate int yy_nxt[][] = unpackFromString(");
1238 emit_table_as_string(yy_nxt);
1239 m_outstream.println(");");
1240 m_outstream.println();
1241 }
1242
1243 /***************************************************************
1244 Function: emit_driver
1245 Description: Output an integer table as a string. Written by
1246 Raimondas Lencevicius 6/24/98; reorganized by CSA 9-Aug-1999.
1247 From his original comments:
1248 yy_nxt[][] values are coded into a string
1249 by printing integers and representing
1250 integer sequences as "value:length" pairs.
1251 **************************************************************/
1252 private void emit_table_as_string(int[][] ia) {
1253 int sequenceLength = 0; // RL - length of the number sequence
1254 boolean sequenceStarted = false; // RL - has number sequence started?
1255 int previousInt = -20; // RL - Bogus -20 state.
1256
1257 // RL - Output matrix size
1258 m_outstream.print(ia.length);
1259 m_outstream.print(",");
1260 m_outstream.print(ia.length>0?ia[0].length:0);
1261 m_outstream.println(",");
1262
1263 StringBuffer outstr = new StringBuffer();
1264
1265 // RL - Output matrix
1266 for (int elem = 0; elem < ia.length; ++elem)
1267 {
1268 for (int i = 0; i < ia[elem].length; ++i)
1269 {
1270 int writeInt = ia[elem][i];
1271 if (writeInt == previousInt) // RL - sequence?
1272 {
1273 if (sequenceStarted)
1274 {
1275 sequenceLength++;
1276 }
1277 else
1278 {
1279 outstr.append(writeInt);
1280 outstr.append(":");
1281 sequenceLength = 2;
1282 sequenceStarted = true;
1283 }
1284 }
1285 else // RL - no sequence or end sequence
1286 {
1287 if (sequenceStarted)
1288 {
1289 outstr.append(sequenceLength);
1290 outstr.append(",");
1291 sequenceLength = 0;
1292 sequenceStarted = false;
1293 }
1294 else
1295 {
1296 if (previousInt != -20)
1297 {
1298 outstr.append(previousInt);
1299 outstr.append(",");
1300 }
1301 }
1302 }
1303 previousInt = writeInt;
1304 // CSA: output in 75 character chunks.
1305 if (outstr.length() > 75) {
1306 String s = outstr.toString();
1307 m_outstream.println("\""+s.substring(0,75)+"\" +");
1308 outstr = new StringBuffer(s.substring(75));
1309 }
1310 }
1311 }
1312 if (sequenceStarted)
1313 {
1314 outstr.append(sequenceLength);
1315 }
1316 else
1317 {
1318 outstr.append(previousInt);
1319 }
1320 // CSA: output in 75 character chunks.
1321 if (outstr.length() > 75) {
1322 String s = outstr.toString();
1323 m_outstream.println("\""+s.substring(0,75)+"\" +");
1324 outstr = new StringBuffer(s.substring(75));
1325 }
1326 m_outstream.print("\""+outstr+"\"");
1327 }
1328
1329 /***************************************************************
1330 Function: emit_driver
1331 Description:
1332 **************************************************************/
1333 private void emit_driver
1334 (
1335 )
1336 throws java.io.IOException
1337 {
1338 if (CUtility.DEBUG)
1339 {
1340 CUtility.assert(null != m_spec);
1341 CUtility.assert(null != m_outstream);
1342 }
1343
1344 emit_table();
1345
1346 if (m_spec.m_integer_type)
1347 {
1348 m_outstream.print("\tpublic int ");
1349 m_outstream.print(new String(m_spec.m_function_name));
1350 m_outstream.println(" ()");
1351 }
1352 else if (m_spec.m_intwrap_type)
1353 {
1354 m_outstream.print("\tpublic java.lang.Integer ");
1355 m_outstream.print(new String(m_spec.m_function_name));
1356 m_outstream.println(" ()");
1357 }
1358 else
1359 {
1360 m_outstream.print("\tpublic ");
1361 m_outstream.print(new String(m_spec.m_type_name));
1362 m_outstream.print(" ");
1363 m_outstream.print(new String(m_spec.m_function_name));
1364 m_outstream.println(" ()");
1365 }
1366
1367 /*m_outstream.println("\t\tthrows java.io.IOException {");*/
1368 m_outstream.print("\t\tthrows java.io.IOException");
1369 if (null != m_spec.m_yylex_throw_code)
1370 {
1371 m_outstream.print(", ");
1372 m_outstream.print(new String(m_spec.m_yylex_throw_code,0,
1373 m_spec.m_yylex_throw_read));
1374 m_outstream.println();
1375 m_outstream.println("\t\t{");
1376 }
1377 else
1378 {
1379 m_outstream.println(" {");
1380 }
1381
1382 m_outstream.println("\t\tint yy_lookahead;");
1383 m_outstream.println("\t\tint yy_anchor = YY_NO_ANCHOR;");
1384 /*m_outstream.println("\t\tint yy_state "
1385 + "= yy_initial_dtrans(yy_lexical_state);");*/
1386 m_outstream.println("\t\tint yy_state "
1387 + "= yy_state_dtrans[yy_lexical_state];");
1388 m_outstream.println("\t\tint yy_next_state = YY_NO_STATE;");
1389 /*m_outstream.println("\t\tint yy_prev_stave = YY_NO_STATE;");*/
1390 m_outstream.println("\t\tint yy_last_accept_state = YY_NO_STATE;");
1391 m_outstream.println("\t\tboolean yy_initial = true;");
1392 m_outstream.println("\t\tint yy_this_accept;");
1393 m_outstream.println();
1394
1395 m_outstream.println("\t\tyy_mark_start();");
1396 /*m_outstream.println("\t\tyy_this_accept = yy_accept(yy_state);");*/
1397 m_outstream.println("\t\tyy_this_accept = yy_acpt[yy_state];");
1398 m_outstream.println("\t\tif (YY_NOT_ACCEPT != yy_this_accept) {");
1399 m_outstream.println("\t\t\tyy_last_accept_state = yy_state;");
1400 m_outstream.println("\t\t\tyy_mark_end();");
1401 m_outstream.println("\t\t}");
1402
1403 if (NOT_EDBG)
1404 {
1405 m_outstream.println("\t\tjava.lang.System.out.println(\"Begin\");");
1406 }
1407
1408 m_outstream.println("\t\twhile (true) {");
1409
1410 m_outstream.println("\t\t\tif (yy_initial && yy_at_bol) "+
1411 "yy_lookahead = YY_BOL;");
1412 m_outstream.println("\t\t\telse yy_lookahead = yy_advance();");
1413 m_outstream.println("\t\t\tyy_next_state = YY_F;");
1414 /*m_outstream.println("\t\t\t\tyy_next_state = "
1415 + "yy_next(yy_state,yy_lookahead);");*/
1416 m_outstream.println("\t\t\tyy_next_state = "
1417 + "yy_nxt[yy_rmap[yy_state]][yy_cmap[yy_lookahead]];");
1418
1419 if (NOT_EDBG)
1420 {
1421 m_outstream.println("java.lang.System.out.println(\"Current state: \""
1422 + " + yy_state");
1423 m_outstream.println("+ \"\tCurrent input: \"");
1424 m_outstream.println(" + ((char) yy_lookahead));");
1425 }
1426 if (NOT_EDBG)
1427 {
1428 m_outstream.println("\t\t\tjava.lang.System.out.println(\"State = \""
1429 + "+ yy_state);");
1430 m_outstream.println("\t\t\tjava.lang.System.out.println(\"Accepting status = \""
1431 + "+ yy_this_accept);");
1432 m_outstream.println("\t\t\tjava.lang.System.out.println(\"Last accepting state = \""
1433 + "+ yy_last_accept_state);");
1434 m_outstream.println("\t\t\tjava.lang.System.out.println(\"Next state = \""
1435 + "+ yy_next_state);");
1436 m_outstream.println("\t\t\tjava.lang.System.out.println(\"Lookahead input = \""
1437 + "+ ((char) yy_lookahead));");
1438 }
1439
1440 // handle bare EOF.
1441 m_outstream.println("\t\t\tif (YY_EOF == yy_lookahead "
1442 + "&& true == yy_initial) {");
1443 if (null != m_spec.m_eof_code)
1444 {
1445 m_outstream.println("\t\t\t\tyy_do_eof();");
1446 }
1447 if (true == m_spec.m_integer_type)
1448 {
1449 m_outstream.println("\t\t\t\treturn YYEOF;");
1450 }
1451 else if (null != m_spec.m_eof_value_code)
1452 {
1453 m_outstream.print(new String(m_spec.m_eof_value_code,0,
1454 m_spec.m_eof_value_read));
1455 }
1456 else
1457 {
1458 m_outstream.println("\t\t\t\treturn null;");
1459 }
1460 m_outstream.println("\t\t\t}");
1461
1462 m_outstream.println("\t\t\tif (YY_F != yy_next_state) {");
1463 m_outstream.println("\t\t\t\tyy_state = yy_next_state;");
1464 m_outstream.println("\t\t\t\tyy_initial = false;");
1465 /*m_outstream.println("\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/
1466 m_outstream.println("\t\t\t\tyy_this_accept = yy_acpt[yy_state];");
1467 m_outstream.println("\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {");
1468 m_outstream.println("\t\t\t\t\tyy_last_accept_state = yy_state;");
1469 m_outstream.println("\t\t\t\t\tyy_mark_end();");
1470 m_outstream.println("\t\t\t\t}");
1471 /*m_outstream.println("\t\t\t\tyy_prev_state = yy_state;");*/
1472 /*m_outstream.println("\t\t\t\tyy_state = yy_next_state;");*/
1473 m_outstream.println("\t\t\t}");
1474
1475 m_outstream.println("\t\t\telse {");
1476
1477 m_outstream.println("\t\t\t\tif (YY_NO_STATE == yy_last_accept_state) {");
1478
1479
1480 /*m_outstream.println("\t\t\t\t\tyy_error(YY_E_MATCH,false);");
1481 m_outstream.println("\t\t\t\t\tyy_initial = true;");
1482 m_outstream.println("\t\t\t\t\tyy_state "
1483 + "= yy_state_dtrans[yy_lexical_state];");
1484 m_outstream.println("\t\t\t\t\tyy_next_state = YY_NO_STATE;");*/
1485 /*m_outstream.println("\t\t\t\t\tyy_prev_state = YY_NO_STATE;");*/
1486 /*m_outstream.println("\t\t\t\t\tyy_last_accept_state = YY_NO_STATE;");
1487 m_outstream.println("\t\t\t\t\tyy_mark_start();");*/
1488 /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/
1489 /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_acpt[yy_state];");
1490 m_outstream.println("\t\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {");
1491 m_outstream.println("\t\t\t\t\t\tyy_last_accept_state = yy_state;");
1492 m_outstream.println("\t\t\t\t\t}");*/
1493
1494 m_outstream.println("\t\t\t\t\tthrow (new Error(\"Lexical Error: Unmatched Input.\"));");
1495 m_outstream.println("\t\t\t\t}");
1496
1497 m_outstream.println("\t\t\t\telse {");
1498
1499 m_outstream.println("\t\t\t\t\tyy_anchor = yy_acpt[yy_last_accept_state];");
1500 /*m_outstream.println("\t\t\t\t\tyy_anchor "
1501 + "= yy_accept(yy_last_accept_state);");*/
1502 m_outstream.println("\t\t\t\t\tif (0 != (YY_END & yy_anchor)) {");
1503 m_outstream.println("\t\t\t\t\t\tyy_move_end();");
1504 m_outstream.println("\t\t\t\t\t}");
1505 m_outstream.println("\t\t\t\t\tyy_to_mark();");
1506
1507 m_outstream.println("\t\t\t\t\tswitch (yy_last_accept_state) {");
1508
1509 emit_actions("\t\t\t\t\t");
1510
1511 m_outstream.println("\t\t\t\t\tdefault:");
1512 m_outstream.println("\t\t\t\t\t\tyy_error(YY_E_INTERNAL,false);");
1513 /*m_outstream.println("\t\t\t\t\t\treturn null;");*/
1514 m_outstream.println("\t\t\t\t\tcase -1:");
1515 m_outstream.println("\t\t\t\t\t}");
1516
1517 m_outstream.println("\t\t\t\t\tyy_initial = true;");
1518 m_outstream.println("\t\t\t\t\tyy_state "
1519 + "= yy_state_dtrans[yy_lexical_state];");
1520 m_outstream.println("\t\t\t\t\tyy_next_state = YY_NO_STATE;");
1521 /*m_outstream.println("\t\t\t\t\tyy_prev_state = YY_NO_STATE;");*/
1522 m_outstream.println("\t\t\t\t\tyy_last_accept_state = YY_NO_STATE;");
1523
1524 m_outstream.println("\t\t\t\t\tyy_mark_start();");
1525
1526 /*m_outstream.println("\t\t\t\t\tyy_this_accept = yy_accept(yy_state);");*/
1527 m_outstream.println("\t\t\t\t\tyy_this_accept = yy_acpt[yy_state];");
1528 m_outstream.println("\t\t\t\t\tif (YY_NOT_ACCEPT != yy_this_accept) {");
1529 m_outstream.println("\t\t\t\t\t\tyy_last_accept_state = yy_state;");
1530 m_outstream.println("\t\t\t\t\t\tyy_mark_end();");
1531 m_outstream.println("\t\t\t\t\t}");
1532
1533 m_outstream.println("\t\t\t\t}");
1534 m_outstream.println("\t\t\t}");
1535 m_outstream.println("\t\t}");
1536 m_outstream.println("\t}");
1537
1538 /*m_outstream.println("\t\t\t\t");
1539 m_outstream.println("\t\t\t");
1540 m_outstream.println("\t\t\t");
1541 m_outstream.println("\t\t\t");
1542 m_outstream.println("\t\t\t");
1543 m_outstream.println("\t\t}");*/
1544 }
1545
1546 /***************************************************************
1547 Function: emit_actions
1548 Description:
1549 **************************************************************/
1550 private void emit_actions
1551 (
1552 String tabs
1553 )
1554 throws java.io.IOException
1555 {
1556 int elem;
1557 int size;
1558 int bogus_index;
1559 CAccept accept;
1560
1561 if (CUtility.DEBUG)
1562 {
1563 CUtility.assert(m_spec.m_accept_vector.size()
1564 == m_spec.m_anchor_array.length);
1565 }
1566
1567 bogus_index = -2;
1568 size = m_spec.m_accept_vector.size();
1569 for (elem = 0; elem < size; ++elem)
1570 {
1571 accept = (CAccept) m_spec.m_accept_vector.elementAt(elem);
1572 if (null != accept)
1573 {
1574 m_outstream.println(tabs + "case " + elem
1575 + ":");
1576 m_outstream.print(tabs + "\t");
1577 m_outstream.print(new String(accept.m_action,0,
1578 accept.m_action_read));
1579 m_outstream.println();
1580 m_outstream.println(tabs + "case " + bogus_index + ":");
1581 m_outstream.println(tabs + "\tbreak;");
1582 --bogus_index;
1583 }
1584 }
1585 }
1586
1587 /***************************************************************
1588 Function: emit_footer
1589 Description:
1590 **************************************************************/
1591 private void emit_footer
1592 (
1593 )
1594 throws java.io.IOException
1595 {
1596 if (CUtility.DEBUG)
1597 {
1598 CUtility.assert(null != m_spec);
1599 CUtility.assert(null != m_outstream);
1600 }
1601
1602 m_outstream.println("}");
1603 }
1604 }
1605
1606 /***************************************************************
1607 Class: CBunch
1608 **************************************************************/
1609 class CBunch
1610 {
1611 /***************************************************************
1612 Member Variables
1613 **************************************************************/
1614 Vector m_nfa_set; /* Vector of CNfa states in dfa state. */
1615 SparseBitSet m_nfa_bit; /* BitSet representation of CNfa labels. */
1616 CAccept m_accept; /* Accepting actions, or null if nonaccepting state. */
1617 int m_anchor; /* Anchors on regular expression. */
1618 int m_accept_index; /* CNfa index corresponding to accepting actions. */
1619
1620 /***************************************************************
1621 Function: CBunch
1622 Description: Constructor.
1623 **************************************************************/
1624 CBunch
1625 (
1626 )
1627 {
1628 m_nfa_set = null;
1629 m_nfa_bit = null;
1630 m_accept = null;
1631 m_anchor = CSpec.NONE;
1632 m_accept_index = -1;
1633 }
1634 }
1635
1636 /***************************************************************
1637 Class: CMakeNfa
1638 **************************************************************/
1639 class CMakeNfa
1640 {
1641 /***************************************************************
1642 Member Variables
1643 **************************************************************/
1644 private CSpec m_spec;
1645 private CLexGen m_lexGen;
1646 private CInput m_input;
1647
1648 /***************************************************************
1649 Function: CMakeNfa
1650 Description: Constructor.
1651 **************************************************************/
1652 CMakeNfa
1653 (
1654 )
1655 {
1656 reset();
1657 }
1658
1659 /***************************************************************
1660 Function: reset
1661 Description: Resets CMakeNfa member variables.
1662 **************************************************************/
1663 private void reset
1664 (
1665 )
1666 {
1667 m_input = null;
1668 m_lexGen = null;
1669 m_spec = null;
1670 }
1671
1672 /***************************************************************
1673 Function: set
1674 Description: Sets CMakeNfa member variables.
1675 **************************************************************/
1676 private void set
1677 (
1678 CLexGen lexGen,
1679 CSpec spec,
1680 CInput input
1681 )
1682 {
1683 if (CUtility.DEBUG)
1684 {
1685 CUtility.assert(null != input);
1686 CUtility.assert(null != lexGen);
1687 CUtility.assert(null != spec);
1688 }
1689
1690 m_input = input;
1691 m_lexGen = lexGen;
1692 m_spec = spec;
1693 }
1694
1695 /***************************************************************
1696 Function: allocate_BOL_EOF
1697 Description: Expands character class to include special BOL and
1698 EOF characters. Puts numeric index of these characters in
1699 input CSpec.
1700 **************************************************************/
1701 void allocate_BOL_EOF
1702 (
1703 CSpec spec
1704 )
1705 {
1706 CUtility.assert(CSpec.NUM_PSEUDO==2);
1707 spec.BOL = spec.m_dtrans_ncols++;
1708 spec.EOF = spec.m_dtrans_ncols++;
1709 }
1710
1711 /***************************************************************
1712 Function: thompson
1713 Description: High level access function to module.
1714 Deposits result in input CSpec.
1715 **************************************************************/
1716 void thompson
1717 (
1718 CLexGen lexGen,
1719 CSpec spec,
1720 CInput input
1721 )
1722 throws java.io.IOException
1723 {
1724 int i;
1725 CNfa elem;
1726 int size;
1727
1728 /* Set member variables. */
1729 reset();
1730 set(lexGen,spec,input);
1731
1732 size = m_spec.m_states.size();
1733 m_spec.m_state_rules = new Vector[size];
1734 for (i = 0; i < size; ++i)
1735 {
1736 m_spec.m_state_rules[i] = new Vector();
1737 }
1738
1739 /* Initialize current token variable
1740 and create nfa. */
1741 /*m_spec.m_current_token = m_lexGen.EOS;
1742 m_lexGen.advance();*/
1743
1744 m_spec.m_nfa_start = machine();
1745
1746 /* Set labels in created nfa machine. */
1747 size = m_spec.m_nfa_states.size();
1748 for (i = 0; i < size; ++i)
1749 {
1750 elem = (CNfa) m_spec.m_nfa_states.elementAt(i);
1751 elem.m_label = i;
1752 }
1753
1754 /* Debugging output. */
1755 if (CUtility.DO_DEBUG)
1756 {
1757 m_lexGen.print_nfa();
1758 }
1759
1760 if (m_spec.m_verbose)
1761 {
1762 System.out.println("NFA comprised of "
1763 + (m_spec.m_nfa_states.size() + 1)
1764 + " states.");
1765 }
1766
1767 reset();
1768 }
1769
1770 /***************************************************************
1771 Function: discardCNfa
1772 Description:
1773 **************************************************************/
1774 private void discardCNfa
1775 (
1776 CNfa nfa
1777 )
1778 {
1779 m_spec.m_nfa_states.removeElement(nfa);
1780 }
1781
1782 /***************************************************************
1783 Function: processStates
1784 Description:
1785 **************************************************************/
1786 private void processStates
1787 (
1788 SparseBitSet states,
1789 CNfa current
1790 )
1791 {
1792 int size;
1793 int i;
1794
1795 size = m_spec.m_states.size();
1796 for (i = 0; i < size; ++i)
1797 {
1798 if (states.get(i))
1799 {
1800 m_spec.m_state_rules[i].addElement(current);
1801 }
1802 }
1803 }
1804
1805 /***************************************************************
1806 Function: machine
1807 Description: Recursive descent regular expression parser.
1808 **************************************************************/
1809 private CNfa machine
1810 (
1811 )
1812 throws java.io.IOException
1813 {
1814 CNfa start;
1815 CNfa p;
1816 SparseBitSet states;
1817
1818 if (CUtility.DESCENT_DEBUG)
1819 {
1820 CUtility.enter("machine",m_spec.m_lexeme,m_spec.m_current_token);
1821 }
1822
1823 start = CAlloc.newCNfa(m_spec);
1824 p = start;
1825
1826 states = m_lexGen.getStates();
1827
1828 /* Begin: Added for states. */
1829 m_spec.m_current_token = m_lexGen.EOS;
1830 m_lexGen.advance();
1831 /* End: Added for states. */
1832
1833 if (m_lexGen.END_OF_INPUT != m_spec.m_current_token) // CSA fix.
1834 {
1835 p.m_next = rule();
1836
1837 processStates(states,p.m_next);
1838 }
1839
1840 while (m_lexGen.END_OF_INPUT != m_spec.m_current_token)
1841 {
1842 /* Make state changes HERE. */
1843 states = m_lexGen.getStates();
1844
1845 /* Begin: Added for states. */
1846 m_lexGen.advance();
1847 if (m_lexGen.END_OF_INPUT == m_spec.m_current_token)
1848 {
1849 break;
1850 }
1851 /* End: Added for states. */
1852
1853 p.m_next2 = CAlloc.newCNfa(m_spec);
1854 p = p.m_next2;
1855 p.m_next = rule();
1856
1857 processStates(states,p.m_next);
1858 }
1859
1860 // CSA: add pseudo-rules for BOL and EOF
1861 SparseBitSet all_states = new SparseBitSet();
1862 for (int i = 0; i < m_spec.m_states.size(); ++i)
1863 all_states.set(i);
1864 p.m_next2 = CAlloc.newCNfa(m_spec);
1865 p = p.m_next2;
1866 p.m_next = CAlloc.newCNfa(m_spec);
1867 p.m_next.m_edge = CNfa.CCL;
1868 p.m_next.m_next = CAlloc.newCNfa(m_spec);
1869 p.m_next.m_set = new CSet();
1870 p.m_next.m_set.add(m_spec.BOL);
1871 p.m_next.m_set.add(m_spec.EOF);
1872 p.m_next.m_next.m_accept = // do-nothing accept rule
1873 new CAccept(new char[0], 0, m_input.m_line_number+1);
1874 processStates(all_states,p.m_next);
1875 // CSA: done.
1876
1877 if (CUtility.DESCENT_DEBUG)
1878 {
1879 CUtility.leave("machine",m_spec.m_lexeme,m_spec.m_current_token);
1880 }
1881
1882 return start;
1883 }
1884
1885 /***************************************************************
1886 Function: rule
1887 Description: Recursive descent regular expression parser.
1888 **************************************************************/
1889 private CNfa rule
1890 (
1891 )
1892 throws java.io.IOException
1893 {
1894 CNfaPair pair;
1895 CNfa p;
1896 CNfa start = null;
1897 CNfa end = null;
1898 int anchor = CSpec.NONE;
1899
1900 if (CUtility.DESCENT_DEBUG)
1901 {
1902 CUtility.enter("rule",m_spec.m_lexeme,m_spec.m_current_token);
1903 }
1904
1905 pair = CAlloc.newCNfaPair();
1906
1907 if (m_lexGen.AT_BOL == m_spec.m_current_token)
1908 {
1909 anchor = anchor | CSpec.START;
1910 m_lexGen.advance();
1911 expr(pair);
1912
1913 // CSA: fixed beginning-of-line operator. 8-aug-1999
1914 start = CAlloc.newCNfa(m_spec);
1915 start.m_edge = m_spec.BOL;
1916 start.m_next = pair.m_start;
1917 end = pair.m_end;
1918 }
1919 else
1920 {
1921 expr(pair);
1922 start = pair.m_start;
1923 end = pair.m_end;
1924 }
1925
1926 if (m_lexGen.AT_EOL == m_spec.m_current_token)
1927 {
1928 m_lexGen.advance();
1929 // CSA: fixed end-of-line operator. 8-aug-1999
1930 CNfaPair nlpair = CAlloc.newNLPair(m_spec);
1931 end.m_next = CAlloc.newCNfa(m_spec);
1932 end.m_next.m_next = nlpair.m_start;
1933 end.m_next.m_next2 = CAlloc.newCNfa(m_spec);
1934 end.m_next.m_next2.m_edge = m_spec.EOF;
1935 end.m_next.m_next2.m_next = nlpair.m_end;
1936 end = nlpair.m_end;
1937 anchor = anchor | CSpec.END;
1938 }
1939
1940 /* Check for null rules. Charles Fischer found this bug. [CSA] */
1941 if (end==null)
1942 CError.parse_error(CError.E_ZERO, m_input.m_line_number);
1943
1944 /* Handle end of regular expression. See page 103. */
1945 end.m_accept = m_lexGen.packAccept();
1946 end.m_anchor = anchor;
1947
1948 /* Begin: Removed for states. */
1949 /*m_lexGen.advance();*/
1950 /* End: Removed for states. */
1951
1952 if (CUtility.DESCENT_DEBUG)
1953 {
1954 CUtility.leave("rule",m_spec.m_lexeme,m_spec.m_current_token);
1955 }
1956
1957 return start;
1958 }
1959
1960 /***************************************************************
1961 Function: expr
1962 Description: Recursive descent regular expression parser.
1963 **************************************************************/
1964 private void expr
1965 (
1966 CNfaPair pair
1967 )
1968 throws java.io.IOException
1969 {
1970 CNfaPair e2_pair;
1971 CNfa p;
1972
1973 if (CUtility.DESCENT_DEBUG)
1974 {
1975 CUtility.enter("expr",m_spec.m_lexeme,m_spec.m_current_token);
1976 }
1977
1978 if (CUtility.DEBUG)
1979 {
1980 CUtility.assert(null != pair);
1981 }
1982
1983 e2_pair = CAlloc.newCNfaPair();
1984
1985 cat_expr(pair);
1986
1987 while (m_lexGen.OR == m_spec.m_current_token)
1988 {
1989 m_lexGen.advance();
1990 cat_expr(e2_pair);
1991
1992 p = CAlloc.newCNfa(m_spec);
1993 p.m_next2 = e2_pair.m_start;
1994 p.m_next = pair.m_start;
1995 pair.m_start = p;
1996
1997 p = CAlloc.newCNfa(m_spec);
1998 pair.m_end.m_next = p;
1999 e2_pair.m_end.m_next = p;
2000 pair.m_end = p;
2001 }
2002
2003 if (CUtility.DESCENT_DEBUG)
2004 {
2005 CUtility.leave("expr",m_spec.m_lexeme,m_spec.m_current_token);
2006 }
2007 }
2008
2009 /***************************************************************
2010 Function: cat_expr
2011 Description: Recursive descent regular expression parser.
2012 **************************************************************/
2013 private void cat_expr
2014 (
2015 CNfaPair pair
2016 )
2017 throws java.io.IOException
2018 {
2019 CNfaPair e2_pair;
2020
2021 if (CUtility.DESCENT_DEBUG)
2022 {
2023 CUtility.enter("cat_expr",m_spec.m_lexeme,m_spec.m_current_token);
2024 }
2025
2026 if (CUtility.DEBUG)
2027 {
2028 CUtility.assert(null != pair);
2029 }
2030
2031 e2_pair = CAlloc.newCNfaPair();
2032
2033 if (first_in_cat(m_spec.m_current_token))
2034 {
2035 factor(pair);
2036 }
2037
2038 while (first_in_cat(m_spec.m_current_token))
2039 {
2040 factor(e2_pair);
2041
2042 /* Destroy */
2043 pair.m_end.mimic(e2_pair.m_start);
2044 discardCNfa(e2_pair.m_start);
2045
2046 pair.m_end = e2_pair.m_end;
2047 }
2048
2049 if (CUtility.DESCENT_DEBUG)
2050 {
2051 CUtility.leave("cat_expr",m_spec.m_lexeme,m_spec.m_current_token);
2052 }
2053 }
2054
2055 /***************************************************************
2056 Function: first_in_cat
2057 Description: Recursive descent regular expression parser.
2058 **************************************************************/
2059 private boolean first_in_cat
2060 (
2061 int token
2062 )
2063 {
2064 switch (token)
2065 {
2066 case CLexGen.CLOSE_PAREN:
2067 case CLexGen.AT_EOL:
2068 case CLexGen.OR:
2069 case CLexGen.EOS:
2070 return false;
2071
2072 case CLexGen.CLOSURE:
2073 case CLexGen.PLUS_CLOSE:
2074 case CLexGen.OPTIONAL:
2075 CError.parse_error(CError.E_CLOSE,m_input.m_line_number);
2076 return false;
2077
2078 case CLexGen.CCL_END:
2079 CError.parse_error(CError.E_BRACKET,m_input.m_line_number);
2080 return false;
2081
2082 case CLexGen.AT_BOL:
2083 CError.parse_error(CError.E_BOL,m_input.m_line_number);
2084 return false;
2085
2086 default:
2087 break;
2088 }
2089
2090 return true;
2091 }
2092
2093 /***************************************************************
2094 Function: factor
2095 Description: Recursive descent regular expression parser.
2096 **************************************************************/
2097 private void factor
2098 (
2099 CNfaPair pair
2100 )
2101 throws java.io.IOException
2102 {
2103 CNfa start = null;
2104 CNfa end = null;
2105
2106 if (CUtility.DESCENT_DEBUG)
2107 {
2108 CUtility.enter("factor",m_spec.m_lexeme,m_spec.m_current_token);
2109 }
2110
2111 term(pair);
2112
2113 if (m_lexGen.CLOSURE == m_spec.m_current_token
2114 || m_lexGen.PLUS_CLOSE == m_spec.m_current_token
2115 || m_lexGen.OPTIONAL == m_spec.m_current_token)
2116 {
2117 start = CAlloc.newCNfa(m_spec);
2118 end = CAlloc.newCNfa(m_spec);
2119
2120 start.m_next = pair.m_start;
2121 pair.m_end.m_next = end;
2122
2123 if (m_lexGen.CLOSURE == m_spec.m_current_token
2124 || m_lexGen.OPTIONAL == m_spec.m_current_token)
2125 {
2126 start.m_next2 = end;
2127 }
2128
2129 if (m_lexGen.CLOSURE == m_spec.m_current_token
2130 || m_lexGen.PLUS_CLOSE == m_spec.m_current_token)
2131 {
2132 pair.m_end.m_next2 = pair.m_start;
2133 }
2134
2135 pair.m_start = start;
2136 pair.m_end = end;
2137 m_lexGen.advance();
2138 }
2139
2140 if (CUtility.DESCENT_DEBUG)
2141 {
2142 CUtility.leave("factor",m_spec.m_lexeme,m_spec.m_current_token);
2143 }
2144 }
2145
2146 /***************************************************************
2147 Function: term
2148 Description: Recursive descent regular expression parser.
2149 **************************************************************/
2150 private void term
2151 (
2152 CNfaPair pair
2153 )
2154 throws java.io.IOException
2155 {
2156 CNfa start;
2157 boolean isAlphaL;
2158 int c;
2159
2160 if (CUtility.DESCENT_DEBUG)
2161 {
2162 CUtility.enter("term",m_spec.m_lexeme,m_spec.m_current_token);
2163 }
2164
2165 if (m_lexGen.OPEN_PAREN == m_spec.m_current_token)
2166 {
2167 m_lexGen.advance();
2168 expr(pair);
2169
2170 if (m_lexGen.CLOSE_PAREN == m_spec.m_current_token)
2171 {
2172 m_lexGen.advance();
2173 }
2174 else
2175 {
2176 CError.parse_error(CError.E_SYNTAX,m_input.m_line_number);
2177 }
2178 }
2179 else
2180 {
2181 start = CAlloc.newCNfa(m_spec);
2182 pair.m_start = start;
2183
2184 start.m_next = CAlloc.newCNfa(m_spec);
2185 pair.m_end = start.m_next;
2186
2187 if (m_lexGen.L == m_spec.m_current_token &&
2188 Character.isLetter(m_spec.m_lexeme))
2189 {
2190 isAlphaL = true;
2191 }
2192 else
2193 {
2194 isAlphaL = false;
2195 }
2196 if (false == (m_lexGen.ANY == m_spec.m_current_token
2197 || m_lexGen.CCL_START == m_spec.m_current_token
2198 || (m_spec.m_ignorecase && isAlphaL)))
2199 {
2200 start.m_edge = m_spec.m_lexeme;
2201 m_lexGen.advance();
2202 }
2203 else
2204 {
2205 start.m_edge = CNfa.CCL;
2206
2207 start.m_set = new CSet();
2208
2209 /* Match case-insensitive letters using character class. */
2210 if (m_spec.m_ignorecase && isAlphaL)
2211 {
2212 start.m_set.addncase(m_spec.m_lexeme);
2213 }
2214 /* Match dot (.) using character class. */
2215 else if (m_lexGen.ANY == m_spec.m_current_token)
2216 {
2217 start.m_set.add('\n');
2218 start.m_set.add('\r');
2219 // CSA: exclude BOL and EOF from character classes
2220 start.m_set.add(m_spec.BOL);
2221 start.m_set.add(m_spec.EOF);
2222 start.m_set.complement();
2223 }
2224 else
2225 {
2226 m_lexGen.advance();
2227 if (m_lexGen.AT_BOL == m_spec.m_current_token)
2228 {
2229 m_lexGen.advance();
2230
2231 // CSA: exclude BOL and EOF from character classes
2232 start.m_set.add(m_spec.BOL);
2233 start.m_set.add(m_spec.EOF);
2234 start.m_set.complement();
2235 }
2236 if (false == (m_lexGen.CCL_END == m_spec.m_current_token))
2237 {
2238 dodash(start.m_set);
2239 }
2240 /*else
2241 {
2242 for (c = 0; c <= ' '; ++c)
2243 {
2244 start.m_set.add((byte) c);
2245 }
2246 }*/
2247 }
2248 m_lexGen.advance();
2249 }
2250 }
2251
2252 if (CUtility.DESCENT_DEBUG)
2253 {
2254 CUtility.leave("term",m_spec.m_lexeme,m_spec.m_current_token);
2255 }
2256 }
2257
2258 /***************************************************************
2259 Function: dodash
2260 Description: Recursive descent regular expression parser.
2261 **************************************************************/
2262 private void dodash
2263 (
2264 CSet set
2265 )
2266 throws java.io.IOException
2267 {
2268 int first = -1;
2269
2270 if (CUtility.DESCENT_DEBUG)
2271 {
2272 CUtility.enter("dodash",m_spec.m_lexeme,m_spec.m_current_token);
2273 }
2274
2275 while (m_lexGen.EOS != m_spec.m_current_token
2276 && m_lexGen.CCL_END != m_spec.m_current_token)
2277 {
2278 // DASH loses its special meaning if it is first in class.
2279 if (m_lexGen.DASH == m_spec.m_current_token && -1 != first)
2280 {
2281 m_lexGen.advance();
2282 // DASH loses its special meaning if it is last in class.
2283 if (m_spec.m_current_token == m_lexGen.CCL_END)
2284 {
2285 // 'first' already in set.
2286 set.add('-');
2287 break;
2288 }
2289 for ( ; first <= m_spec.m_lexeme; ++first)
2290 {
2291 if (m_spec.m_ignorecase)
2292 set.addncase((char)first);
2293 else
2294 set.add(first);
2295 }
2296 }
2297 else
2298 {
2299 first = m_spec.m_lexeme;
2300 if (m_spec.m_ignorecase)
2301 set.addncase(m_spec.m_lexeme);
2302 else
2303 set.add(m_spec.m_lexeme);
2304 }
2305
2306 m_lexGen.advance();
2307 }
2308
2309 if (CUtility.DESCENT_DEBUG)
2310 {
2311 CUtility.leave("dodash",m_spec.m_lexeme,m_spec.m_current_token);
2312 }
2313 }
2314 }
2315
2316 /**
2317 * Extract character classes from NFA and simplify.
2318 * @author C. Scott Ananian 25-Jul-1999
2319 */
2320 class CSimplifyNfa
2321 {
2322 private int[] ccls; // character class mapping.
2323 private int original_charset_size; // original charset size
2324 private int mapped_charset_size; // reduced charset size
2325
2326 void simplify(CSpec m_spec) {
2327 computeClasses(m_spec); // initialize fields.
2328
2329 // now rewrite the NFA using our character class mapping.
2330 for (Enumeration e=m_spec.m_nfa_states.elements(); e.hasMoreElements(); ) {
2331 CNfa nfa = (CNfa) e.nextElement();
2332 if (nfa.m_edge==CNfa.EMPTY || nfa.m_edge==CNfa.EPSILON)
2333 continue; // no change.
2334 if (nfa.m_edge==CNfa.CCL) {
2335 CSet ncset = new CSet();
2336 ncset.map(nfa.m_set, ccls); // map it.
2337 nfa.m_set = ncset;
2338 } else { // single character
2339 nfa.m_edge = ccls[nfa.m_edge]; // map it.
2340 }
2341 }
2342
2343 // now update m_spec with the mapping.
2344 m_spec.m_ccls_map = ccls;
2345 m_spec.m_dtrans_ncols = mapped_charset_size;
2346 }
2347 /** Compute minimum set of character classes needed to disambiguate
2348 * edges. We optimistically assume that every character belongs to
2349 * a single character class, and then incrementally split classes
2350 * as we see edges that require discrimination between characters in
2351 * the class. [CSA, 25-Jul-1999] */
2352 private void computeClasses(CSpec m_spec) {
2353 this.original_charset_size = m_spec.m_dtrans_ncols;
2354 this.ccls = new int[original_charset_size]; // initially all zero.
2355
2356 int nextcls = 1;
2357 SparseBitSet clsA = new SparseBitSet(), clsB = new SparseBitSet();
2358 Hashtable h = new Hashtable();
2359
2360 System.out.print("Working on character classes.");
2361 for (Enumeration e=m_spec.m_nfa_states.elements(); e.hasMoreElements(); ) {
2362 CNfa nfa = (CNfa) e.nextElement();
2363 if (nfa.m_edge==CNfa.EMPTY || nfa.m_edge==CNfa.EPSILON)
2364 continue; // no discriminatory information.
2365 clsA.clearAll(); clsB.clearAll();
2366 for (int i=0; i<ccls.length; i++)
2367 if (nfa.m_edge==i || // edge labeled with a character
2368 nfa.m_edge==CNfa.CCL && nfa.m_set.contains(i)) // set of characters
2369 clsA.set(ccls[i]);
2370 else
2371 clsB.set(ccls[i]);
2372 // now figure out which character classes we need to split.
2373 clsA.and(clsB); // split the classes which show up on both sides of edge
2374 System.out.print(clsA.size()==0?".":":");
2375 if (clsA.size()==0) continue; // nothing to do.
2376 // and split them.
2377 h.clear(); // h will map old to new class name
2378 for (int i=0; i<ccls.length; i++)
2379 if (clsA.get(ccls[i])) // a split class
2380 if (nfa.m_edge==i ||
2381 nfa.m_edge==CNfa.CCL && nfa.m_set.contains(i)) { // on A side
2382 Integer split = new Integer(ccls[i]);
2383 if (!h.containsKey(split))
2384 h.put(split, new Integer(nextcls++)); // make new class
2385 ccls[i] = ((Integer)h.get(split)).intValue();
2386 }
2387 }
2388 System.out.println();
2389 System.out.println("NFA has "+nextcls+" distinct character classes.");
2390
2391 this.mapped_charset_size = nextcls;
2392 }
2393 }
2394
2395 /***************************************************************
2396 Class: CMinimize
2397 **************************************************************/
2398 class CMinimize
2399 {
2400 /***************************************************************
2401 Member Variables
2402 **************************************************************/
2403 CSpec m_spec;
2404 Vector m_group;
2405 int m_ingroup[];
2406
2407 /***************************************************************
2408 Function: CMinimize
2409 Description: Constructor.
2410 **************************************************************/
2411 CMinimize
2412 (
2413 )
2414 {
2415 reset();
2416 }
2417
2418 /***************************************************************
2419 Function: reset
2420 Description: Resets member variables.
2421 **************************************************************/
2422 private void reset
2423 (
2424 )
2425 {
2426 m_spec = null;
2427 m_group = null;
2428 m_ingroup = null;
2429 }
2430
2431 /***************************************************************
2432 Function: set
2433 Description: Sets member variables.
2434 **************************************************************/
2435 private void set
2436 (
2437 CSpec spec
2438 )
2439 {
2440 if (CUtility.DEBUG)
2441 {
2442 CUtility.assert(null != spec);
2443 }
2444
2445 m_spec = spec;
2446 m_group = null;
2447 m_ingroup = null;
2448 }
2449
2450 /***************************************************************
2451 Function: min_dfa
2452 Description: High-level access function to module.
2453 **************************************************************/
2454 void min_dfa
2455 (
2456 CSpec spec
2457 )
2458 {
2459 set(spec);
2460
2461 /* Remove redundant states. */
2462 minimize();
2463
2464 /* Column and row compression.
2465 Save accept states in auxilary vector. */
2466 reduce();
2467
2468 reset();
2469 }
2470
2471 /***************************************************************
2472 Function: col_copy
2473 Description: Copies source column into destination column.
2474 **************************************************************/
2475 private void col_copy
2476 (
2477 int dest,
2478 int src
2479 )
2480 {
2481 int n;
2482 int i;
2483 CDTrans dtrans;
2484
2485 n = m_spec.m_dtrans_vector.size();
2486 for (i = 0; i < n; ++i)
2487 {
2488 dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
2489 dtrans.m_dtrans[dest] = dtrans.m_dtrans[src];
2490 }
2491 }
2492
2493 /***************************************************************
2494 Function: trunc_col
2495 Description: Truncates each column to the 'correct' length.
2496 **************************************************************/
2497 private void trunc_col
2498 (
2499 )
2500 {
2501 int n;
2502 int i;
2503 CDTrans dtrans;
2504
2505 n = m_spec.m_dtrans_vector.size();
2506 for (i = 0; i < n; ++i)
2507 {
2508 int[] ndtrans = new int[m_spec.m_dtrans_ncols];
2509 dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
2510 System.arraycopy(dtrans.m_dtrans, 0, ndtrans, 0, ndtrans.length);
2511 dtrans.m_dtrans = ndtrans;
2512 }
2513 }
2514 /***************************************************************
2515 Function: row_copy
2516 Description: Copies source row into destination row.
2517 **************************************************************/
2518 private void row_copy
2519 (
2520 int dest,
2521 int src
2522 )
2523 {
2524 CDTrans dtrans;
2525
2526 dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(src);
2527 m_spec.m_dtrans_vector.setElementAt(dtrans,dest);
2528 }
2529
2530 /***************************************************************
2531 Function: col_equiv
2532 Description:
2533 **************************************************************/
2534 private boolean col_equiv
2535 (
2536 int col1,
2537 int col2
2538 )
2539 {
2540 int n;
2541 int i;
2542 CDTrans dtrans;
2543
2544 n = m_spec.m_dtrans_vector.size();
2545 for (i = 0; i < n; ++i)
2546 {
2547 dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
2548 if (dtrans.m_dtrans[col1] != dtrans.m_dtrans[col2])
2549 {
2550 return false;
2551 }
2552 }
2553
2554 return true;
2555 }
2556
2557 /***************************************************************
2558 Function: row_equiv
2559 Description:
2560 **************************************************************/
2561 private boolean row_equiv
2562 (
2563 int row1,
2564 int row2
2565 )
2566 {
2567 int i;
2568 CDTrans dtrans1;
2569 CDTrans dtrans2;
2570
2571 dtrans1 = (CDTrans) m_spec.m_dtrans_vector.elementAt(row1);
2572 dtrans2 = (CDTrans) m_spec.m_dtrans_vector.elementAt(row2);
2573
2574 for (i = 0; i < m_spec.m_dtrans_ncols; ++i)
2575 {
2576 if (dtrans1.m_dtrans[i] != dtrans2.m_dtrans[i])
2577 {
2578 return false;
2579 }
2580 }
2581
2582 return true;
2583 }
2584
2585 /***************************************************************
2586 Function: reduce
2587 Description:
2588 **************************************************************/
2589 private void reduce
2590 (
2591 )
2592 {
2593 int i;
2594 int j;
2595 int k;
2596 int nrows;
2597 int reduced_ncols;
2598 int reduced_nrows;
2599 SparseBitSet set;
2600 CDTrans dtrans;
2601 int size;
2602
2603 set = new SparseBitSet();
2604
2605 /* Save accept nodes and anchor entries. */
2606 size = m_spec.m_dtrans_vector.size();
2607 m_spec.m_anchor_array = new int[size];
2608 m_spec.m_accept_vector = new Vector();
2609 for (i = 0; i < size; ++i)
2610 {
2611 dtrans = (CDTrans) m_spec.m_dtrans_vector.elementAt(i);
2612 m_spec.m_accept_vector.addElement(dtrans.m_accept);
2613 m_spec.m_anchor_array[i] = dtrans.m_anchor;
2614 dtrans.m_accept = null;
2615 }
2616
2617 /* Allocate column map. */
2618 m_spec.m_col_map = new int[m_spec.m_dtrans_ncols];
2619 for (i = 0; i < m_spec.m_dtrans_ncols; ++i)
2620 {
2621 m_spec.m_col_map[i] = -1;
2622 }
2623
2624 /* Process columns for reduction. */
2625 for (reduced_ncols = 0; ; ++reduced_ncols)
2626 {
2627 if (CUtility.DEBUG)
2628 {
2629 for (i = 0; i < reduced_ncols; ++i)
2630 {
2631 CUtility.assert(-1 != m_spec.m_col_map[i]);
2632 }
2633 }
2634
2635 for (i = reduced_ncols; i < m_spec.m_dtrans_ncols; ++i)
2636 {
2637 if (-1 == m_spec.m_col_map[i])
2638 {
2639 break;
2640 }
2641 }
2642
2643 if (i >= m_spec.m_dtrans_ncols)
2644 {
2645 break;
2646 }
2647
2648 if (CUtility.DEBUG)
2649 {
2650 CUtility.assert(false == set.get(i));
2651 CUtility.assert(-1 == m_spec.m_col_map[i]);
2652 }
2653
2654 set.set(i);
2655
2656 m_spec.m_col_map[i] = reduced_ncols;
2657
2658 /* UNDONE: Optimize by doing all comparisons in one batch. */
2659 for (j = i + 1; j < m_spec.m_dtrans_ncols; ++j)
2660 {
2661 if (-1 == m_spec.m_col_map[j] && true == col_equiv(i,j))
2662 {
2663 m_spec.m_col_map[j] = reduced_ncols;
2664 }
2665 }
2666 }
2667
2668 /* Reduce columns. */
2669 k = 0;
2670 for (i = 0; i < m_spec.m_dtrans_ncols; ++i)
2671 {
2672 if (set.get(i))
2673 {
2674 ++k;
2675
2676 set.clear(i);
2677
2678 j = m_spec.m_col_map[i];
2679
2680 if (CUtility.DEBUG)
2681 {
2682 CUtility.assert(j <= i);
2683 }
2684
2685 if (j == i)
2686 {
2687 continue;
2688 }
2689
2690 col_copy(j,i);
2691 }
2692 }
2693 m_spec.m_dtrans_ncols = reduced_ncols;
2694 /* truncate m_dtrans at proper length (freeing extra) */
2695 trunc_col();
2696
2697 if (CUtility.DEBUG)
2698 {
2699 CUtility.assert(k == reduced_ncols);
2700 }
2701
2702 /* Allocate row map. */
2703 nrows = m_spec.m_dtrans_vector.size();
2704 m_spec.m_row_map = new int[nrows];
2705 for (i = 0; i < nrows; ++i)
2706 {
2707 m_spec.m_row_map[i] = -1;
2708 }
2709
2710 /* Process rows to reduce. */
2711 for (reduced_nrows = 0; ; ++reduced_nrows)
2712 {
2713 if (CUtility.DEBUG)
2714 {
2715 for (i = 0; i < reduced_nrows; ++i)
2716 {
2717 CUtility.assert(-1 != m_spec.m_row_map[i]);
2718 }
2719 }
2720
2721 for (i = reduced_nrows; i < nrows; ++i)
2722 {
2723 if (-1 == m_spec.m_row_map[i])
2724 {
2725 break;
2726 }
2727 }
2728
2729 if (i >= nrows)
2730 {
2731 break;
2732 }
2733
2734 if (CUtility.DEBUG)
2735 {
2736 CUtility.assert(false == set.get(i));
2737 CUtility.assert(-1 == m_spec.m_row_map[i]);
2738 }
2739
2740 set.set(i);
2741
2742 m_spec.m_row_map[i] = reduced_nrows;
2743
2744 /* UNDONE: Optimize by doing all comparisons in one batch. */
2745 for (j = i + 1; j < nrows; ++j)
2746 {
2747 if (-1 == m_spec.m_row_map[j] && true == row_equiv(i,j))
2748 {
2749 m_spec.m_row_map[j] = reduced_nrows;
2750 }
2751 }
2752 }
2753
2754 /* Reduce rows. */
2755 k = 0;
2756 for (i = 0; i < nrows; ++i)
2757 {
2758 if (set.get(i))
2759 {
2760 ++k;
2761
2762 set.clear(i);
2763
2764 j = m_spec.m_row_map[i];
2765
2766 if (CUtility.DEBUG)
2767 {
2768 CUtility.assert(j <= i);
2769 }
2770
2771 if (j == i)
2772 {
2773 continue;
2774 }
2775
2776 row_copy(j,i);
2777 }
2778 }
2779 m_spec.m_dtrans_