Lexical Analyser for languages like C,PASCAL etc. (Mini Project)


Lexical analysis or scanning is the process where the stream of characters making up the source program is read from left-to-right and grouped into tokens. Tokens are sequences of characters with a collective meaning. There are usually only a small number of tokens for a programming language: constants (integer, double, char, string, etc.), operators (arithmetic, relational, logical), punctuation, and reserved words.The lexical analyzer takes a source program as input, and produces a stream of tokens as output.The lexical analyzer might recognize particular instances of tokens such as:

  • 3 or 255 for an integer constant token
  • "Fred" or "Wilma" for a string constant token
  • numTickets or queue for a variable token
  • Such specific instances are called lexemes.
A lexeme is the actual character sequence forming a token, the token is the general class that a lexeme belongs to. Some tokens have exactly one lexeme (e.g., the > character); for others, there are many lexemes (e.g.,
integer constants). The scanner is tasked with determining that the input stream can be divided into valid symbols in the source language, but has no smarts about which token should come where. Few errors can be detected at the lexical level alone because the scanner has a very localized view of the source program without any context. The scanner can report about characters that are not valid tokens (e.g., an illegal or unrecognized symbol) and a few other malformed entities (illegal characters within a string constant, unterminated comments, etc.) It does not look for or detect garbled sequences, tokens out of place, undeclared identifiers, misspelled keywords, mismatched types and the like. For example, the following input will not generate any errors in the lexical analysis phase, because the scanner has no concept of the appropriate arrangement of tokens for a declaration. The syntax analyzer will catch this error later in the next phase.
int a double } switch b[2] =;
Furthermore, the scanner has no idea how tokens are grouped. In the above sequence, it returns b, [, 2, and ] as four separate tokens, having no idea they collectively form an array access. The lexical analyzer can be a convenient place to carry out some other chores like stripping out comments and white space between tokens and perhaps even some features like macros and conditional compilation (although often these are handled by some sort of preprocessor which filters the input before the compiler runs).


A simulated lexical analyser for HLL like C,PASCAL etc. I have given a sample text file from which the source code reads the dummy program n analyses it. The program can be extended by adding more.

Code:

Share on Google Plus

About Unknown

This is a short description in the author block about the author. You edit it by entering text in the "Biographical Info" field in the user admin panel.

0 comments:

Post a Comment

Thanks for your Valuable comment