컴파일러 수업 정리

created : 2022-03-25T13:26:53+00:00
modified : 2022-03-25T13:27:33+00:00

lecture

Introduction

The Move to Higher-Level Languages

Using High-Level Languages is a Free Lunch?

The Major Role of Language Processors

Two Representative Strategies

  Compilation Interpretation
What to translate An entire source program One statement of a source program
When to translate Once before the program runs Every time when the statement is executed
Translation Result A target program (equivalent to the source program) Target code (equivalent to the statement)
Examples C, C++ Javascript

Common Language-Processing Systems

Requirements for Designing Good Compilers

Structure of Modern Compilers

Lexical Analyzer (Scanner)

Syntax Analyzer (Parser)

Semantic Analyzer

Intermediate Code Generator

Code Optimization (Optional)

Code Generator

Lexical Analysis

Part1. Specification of Tokens

Overview

Definition: Tokens

Definition: Lexemes

Class of Tokens

Lexical Anlysis does

  1. Partitioning input strings into sub-strings (lexemes)
  2. Identifying the token of each lexeme

For the specifcation of tokens

Definition: Alphabet, String, and Language

Operations on Strings

Operations on Languages

Regurlar Expressions

Regular expression Expressed regular language
$\epsilon$ $L(\epsilon) = { \epsilon }$
$a$ $L(a) = { a }$, where $a$ is a symbol in alphabet $\Sigma$
$r_1 \vert r_2$ $L(r_1) \cup L(r_2)$, where $r_1$ and $r_2$ are regular expressions
$r_1 r_2$ $L(r_1r_2) = L(r_1) L(r_2) = { s_1 s_2 \vert s_1 \in L(r_1) \text{ and } s_2 \in L(r_2)}$
$r^*$ $L(r^*) = \Cup_{i \ge 0} L(r^i)$

Rules for Regular Expressions

Examples of Specifying Tokens

To Recognize Tokens

  1. Merge the regular expression of tokens $Merged = Keyword \vert Identifier \vert Comparison \vert Float \vert Whitespace \vert \cdots$
  2. When an input stream $a_1 a_2 a_3 \cdots a_n$ is given,
     mIdx = 0;
     for i <= i <= n
       if a_1 a_2 ... a_n \in L(Merged), mIdx = i;
     end
     partition and classify a_1 a_2 ... a_{mIdx}
    
  3. Do the step 2 for the remaning input stream

Summary

Part 2. Recognition of Tokens

Finite Automata

Deterministic vs. Non-deterministic

  DFA NFA
transitions All transitions are deterministic Some transitions could be non-deterministic
$\epsilon$-move x o
# paths for a given input Only one One or more
Accepting condition For a given input, its path must end in one of accepting states For a given input, there must be at least one path ending in one of accepting states
Pros Fast to execute Simple to represent (easy to make/understand)
Cons Complex ->space problem (exponentially larger than NFA) Slow -> performance problem (several paths)

Procedures for Implementing Lexical Analyzers

Regular Expressions to NFAs

NFAs to DFAs

Summary

Syntax Analysis (Parser)

Part 1. Context-Free Grammars (CFG)

Syntax Analyzer

  1. Decides whether a given set of tokens is valid or not
    • Parse tree:
      • It shows how the start symbol of a grammar derives a string in the language
      • Given a context-free grammar, a parse tree is a tree is a tree with the following properties:
        • The root is labeled by the start symbol
        • Each leaf is labeled by a terminal or by $\epsilon$
        • Each interior node is labeled by a non-terminal
        • If $A$ is the non-terminal labeling some interior node and $X_1, …, X_n$ are the labels of the children of that node from left to right, then there must be a production $A \rightarrow X_1 X_2 \cdots X_n$
  2. Creates a tree-like intermediate representation (e.g., parse tree) that depicts the grammatical structure of the token stream

Why don’t we use regular expressions?

Context-Free Grammars (CFG)

Derivations

Token Validation Test

  1. Decides whether a given set of tokens is valid or not
    • Q. How to specify the rule for deciding valid token set?
    • A. Make a context free grammar G based on the rule of a programming language
    • Q. How to distinguish between valid and invalid token sets?
    • A. Check whether the given token set can be derived from the context free grammar G