|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectpal.io.NexusTokenizer
A simple token pull-parser for the NEXUS file format as specified in:
Maddison, D. R., Swofford, D. L., & Maddison, W. P., Systematic Biology, 46(4), pp. 590 - 621.
The parser is designed to break a NEXUS file into tokens which are read individually. Tokens come in four different types:
' '
or
'\t'
. Whitespace is only returned if the option is set'\r'
, '\n'
or '\r\n'
.
The parser will return the character unless convertNL
is
set, in which case it will replace the token with the user specified
new line characterThe parser has a set of options allowing tokens to be modified before they are returned (such as case modification or newline substitution).
Each read by the parser moves forward in the stream, at present there is no support for unreading tokens or for moving bi-directionally through the stream
NB: in this implementation, the token #NEXUS is considered special and when
read by the parser, it will return one token: '#NEXUS' not two: '#' and 'NEXUS'.
This token has special meaning and is reflected in it having its own token type
NexusTokenizer ntp = new NexusTokenizer(new PushbackReader(new FileReader("afile")));
ntp.setReadWhiteSpace(false);
// ignore whitespace
ntp.setIgnoreComments(true);
// ignore comments
ntp.setWordModification(NexusTokenizer.WORD_UPPERCASE);
// all tokens in uppercase
String nToken = ntp.readToken();
while(nToken != null) {
System.out.println("Token: " + nToken);
System.out.println("Col: " + ntp.getCol());
System.out.println("Row: " + ntp.getRow());
}
Field Summary | |
static char |
ADDITION
|
static char |
ASTERIX
|
static char |
B_SLASH
|
static char |
B_TICK
|
static char |
C_RETURN
|
static char |
COLON
|
static char |
COMMA
|
static char |
D_QUOTE
|
static char |
DASH
|
static char |
EQUALS
|
static char |
F_SLASH
|
static char |
G_THAN
|
static char |
HASH
|
static int |
HEADER_TOKEN
Flag indicating last token read was the header token #NEXUS |
static char |
L_BRACE
|
static char |
L_BRACKET
|
static char |
L_FEED
|
static char |
L_PARENTHESIS
|
static char |
L_THAN
|
static int |
NEWLINE_TOKEN
Flag indicating last token read was a newline symbol/word |
static char |
PERIOD
|
static int |
PUNCTUATION_TOKEN
Flag indicating last token read was a punctuation symbol |
static char |
R_BRACE
|
static char |
R_BRACKET
|
static char |
R_PARENTHESIS
|
static char |
S_QUOTE
|
static char |
SEMI_COLON
|
static char |
SPACE
|
static char |
TAB
|
static int |
UNDEFINED_TOKEN
Flag indicating last token read was undefined |
static int |
WHITESPACE_TOKEN
Flag indicating last token read was whitespace |
static int |
WORD_LOWERCASE
Flag indicating words should be converted to lowercase |
static int |
WORD_TOKEN
Flag indicating last token read was a word |
static int |
WORD_UNMODIFIED
Flag indicating words should be untouched |
static int |
WORD_UPPERCASE
Flag indicating words should be converted to uppercase |
Constructor Summary | |
NexusTokenizer(java.io.PushbackReader pr)
Constructor for a NexusTokenParser |
|
NexusTokenizer(java.lang.String file)
Constructor for a NexusTokenParser |
Method Summary | |
boolean |
convertNewLine()
Gets the flag indicating whether this parser instance should convert newline characters. |
int |
getCol()
Gets the current column position of the cursor. |
java.lang.String |
getLastReadToken()
Returns the last read token. |
int |
getLastTokenType()
Determine the type of the last read token. |
int |
getRow()
Gets the current row position of the cursor. |
int |
getWordModification()
Gets the word modification flag currently in use |
java.lang.String |
readToken()
Reads a token in from the underlying stream. |
boolean |
readWhiteSpace()
Get the flag indicating whether or not this parser object is reading (and returning) whitespace |
java.lang.String |
seek(int tokenType)
Seeks through the stream to find the next token of the specified type. |
java.lang.String |
seek(java.lang.String token)
Seeks through the stream to find the token argument. |
void |
setConvertNewLine(boolean b)
Sets the convertNL flag. |
void |
setIgnoreComments(boolean b)
Sets the ignoreComments flag. |
void |
setNewLineChar(char nl)
Sets the character to be convert newline characters into |
void |
setReadWhiteSpace(boolean b)
Sets the readWS flag. |
void |
setWordModification(int flag)
Sets the flag value for word modification. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final char L_PARENTHESIS
public static final char R_PARENTHESIS
public static final char L_BRACKET
public static final char R_BRACKET
public static final char L_BRACE
public static final char R_BRACE
public static final char F_SLASH
public static final char B_SLASH
public static final char COMMA
public static final char SEMI_COLON
public static final char COLON
public static final char EQUALS
public static final char ASTERIX
public static final char S_QUOTE
public static final char D_QUOTE
public static final char B_TICK
public static final char ADDITION
public static final char DASH
public static final char L_THAN
public static final char G_THAN
public static final char HASH
public static final char PERIOD
public static final char L_FEED
public static final char C_RETURN
public static final char TAB
public static final char SPACE
public static final int WORD_UPPERCASE
public static final int WORD_LOWERCASE
public static final int WORD_UNMODIFIED
public static final int UNDEFINED_TOKEN
public static final int WORD_TOKEN
public static final int PUNCTUATION_TOKEN
public static final int NEWLINE_TOKEN
public static final int WHITESPACE_TOKEN
public static final int HEADER_TOKEN
Constructor Detail |
public NexusTokenizer(java.lang.String file) throws java.io.IOException
NexusTokenParser
file
- File name for the NEXUS file
java.io.IOException
- I/O errorspublic NexusTokenizer(java.io.PushbackReader pr) throws java.io.IOException
NexusTokenParser
pr
- PushbackReader
java.io.IOException
- I/O errorsMethod Detail |
public boolean readWhiteSpace()
readWS
flagpublic boolean convertNewLine()
convertNL
flagpublic void setReadWhiteSpace(boolean b)
readWS
flag. True means that the parser will return
whitespace characters as a token (where whitespace = ' ' or '\t').
b
- flag value for readWS
public void setConvertNewLine(boolean b)
convertNL
flag. True means that the the parser will
convert newline characters ('\r', '\n' or '\r\n') into either the default
('\n' if setNewLineChar()
is not called) or to a user specified
newline char
b
- flag value for convertNL
public void setIgnoreComments(boolean b)
ignoreComments
flag. True means that the the tokenizer
will ignore comments (i.e. sections of a nexus file delimited by '[...]'.
When set to true, the tokenizer will return the first token available after
a comment.
b
- flag value for ignoreComments
public void setNewLineChar(char nl)
nl
- Replacement newline characterpublic int getCol()
public int getRow()
public int getWordModification()
public void setWordModification(int flag)
WORD_UNMODIFIED
indicates that the tokens should be
returned in the case that they are read from the stream. This value can
be set at any time between token reads and thus the next token read will
be altered depending on this value. The default is WORD_UNMODIFIED.
flag
- Flag value, one of WORD_LOWERCASE
,
WORD_UPPERCASE
or WORD_UNMODIFIED
public java.lang.String readToken() throws java.io.IOException, NexusParseException
convertNL
is set, in which case it will replace
the token with the user specified new line character
String
token or
null
if EOF is reached
(i.e. no more tokens to read)
java.io.IOException
- I/O errors
NexusParseException
- Parsing errorspublic int getLastTokenType()
readToken()
has been called, the type of token returned can be determined by calling
getLastTokenType()
. This returns one of five different constants:
UNDEFINED_TOKEN
: default before anything is read
from the streamWORD_TOKEN
: word token was readPUNCTUATION_TOKEN
: punctuation token was readNEWLINE_TOKEN
: newline token was readWHITESPACE_TOKEN
: whitespace token was read (never
returned unless whitespace is being returned) HEADER_TOKEN
: last token was the special word #NEXUS
public java.lang.String seek(int tokenType) throws java.io.IOException, NexusParseException
String
token or
null
if EOF is reached
(i.e. no more tokens to read)
java.io.IOException
- I/O errors
NexusParseException
- Thrown by parsing errors or if
tokenType == WHITESPACE_TOKEN &&
readWhiteSpace() == falsepublic java.lang.String seek(java.lang.String token) throws java.io.IOException, NexusParseException
String
token or
null
if token is not found
(i.e. EOF is reached)
java.io.IOException
- I/O errors
NexusParseException
- Thrown by parsing errors or if
token is whitespace &&
readWhiteSpace() == falsepublic java.lang.String getLastReadToken()
readToken()
stores the
returned token so that it can be retrieved again. However, each consuming
readToken()
call replaces this buffer with the new token.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |