ComputersProgramming

The translator is ... Types of translators. Convert and translate the program

Programs, like people, require a translator or translator to translate from one language to another.

Basic concepts

The program is a linguistic representation of the computations: i → P → P (i). The interpreter is a program, the input of which is given by program P and some input data x. It performs P on x: I (P, x) = P (x). The fact that there is a single translator capable of performing all possible programs (which can be represented in the formal system) is a very deep and significant discovery of Turing.

The processor is an interpreter of programs in the machine language. It is usually too expensive to write interpreters for high-level languages, so they are translated into a form that is easier to interpret.

Some types of translators have very strange names:

  • Assembler translates programs in assembler into machine language.
  • The compiler translates from a high-level language to a lower-level language.

A translator is a program that accepts a program in some language S as its input and produces a program in T in such a way that they both have the same semantics: P → X → Q. That is, ∀x. P (x) = Q (x).

If you translate the entire program into something that is interpreted, then this is called compilation before execution, or AOT-compilation. AOT compilers can be used sequentially, the latter of which is often an assembler, for example:

Source code → Compiler (compiler) → Assembler code → Assembler (translator) → Machine code → CPU (interpreter).

An online or dynamic compilation occurs if part of the program is translated when other compiled parts are executed. JIT-translators remember what they have already done, so as not to repeat the source code again and again. They can even produce adaptive compilation and recompilation, based on the behavior of the runtime environment.

Many languages allow you to execute code during translation and compile new code at runtime.

Stages of translation

The translation consists of the stages of analysis and synthesis:

Source code → Analyzer → Conceptual view → Generator (synthesizer) → Target code.

This is due to the following reasons:

  • Any other method is not suitable. The word-by-word translation simply does not work.
  • A good engineering solution: if you need to write translators for M source languages and N target languages, you need to write only M + N simple programs (semicompilers), not M × N complex (full translators).

Nevertheless, in practice, the conceptual representation is very rarely sufficiently expressive and powerful to cover all conceivable source and target languages. Although some of them could get close to this.

Real compilers go through many stages. When creating your own compiler, you do not need to repeat all the hard work that people have already done when creating views and generators. You can translate your language directly into JavaScript or C and use existing JavaScript engines and C compilers to do the rest. You can also use existing intermediate views and virtual machines.

Record of the translator

A translator is a program or a technical tool in which three languages are involved: source, target and basic. They can be written in the T-form, placing the source on the left, the target on the right, and the base one below.

There are three types of compilers:

  • A translator is a self-compiler if its source language corresponds to a basic one.
  • The compiler, whose target language is equal to the base language, is called self-resident.
  • A translator is a cross-compiler if it has a different target language and a basic language.

Why is it important?

Even if you never make a real compiler, it's good to know about the technology of its creation, because the concepts used for this are applied everywhere, for example in:

  • Formatting of texts;
  • Query languages to databases;
  • Extended computer architectures;
  • Generalized optimization problems;
  • Graphical interfaces;
  • Scripting languages;
  • Controllers;
  • Virtual machines;
  • Machine translations.

In addition, if you want to write preprocessors, builders, loaders, debuggers, or profilers, you need to go through the same steps as when writing the compiler.

You can also learn how to write programs better, since creating a translator for a language means a better understanding of its subtleties and ambiguities. Learning the general principles of translation also makes it possible to become a good language designer. Is it so important, how sharp is the language, if it can not be realized effectively?

Comprehensive technology

The compiler technology covers many different areas of computer science:

  • Formal language theory: grammar, parsing, computability;
  • Computer architecture: instruction sets, RISC or CISC, pipelining, kernels, clock cycles, etc .;
  • Concepts of programming languages: for example, sequence control, conditional execution, iteration, recursion, functional decomposition, modularity, synchronization, metaprogramming, scope, constants, subtypes, templates, output type, prototypes, annotations, threads, monads, mailboxes, extensions , Wildcards, regular expressions, transactional memory, inheritance, polymorphism, parameter modes, etc .;
  • Abstract languages and virtual machines;
  • Algorithms and data structures: regular expressions, parsing algorithms, graphic algorithms, dynamic programming, learning;
  • Programming languages: syntax, semantics (static and dynamic), support for paradigms (structural, OOP, functional, logical, stack, concurrency, metaprogramming);
  • Software creation (compilers, as a rule, large and complex): localization, caching, componentization, API-interfaces, reuse, synchronization.

Designing the compiler

Some problems that arise when developing a real translator:

  • Problems with the source language. Is it easy to compile it? Is there a preprocessor? How are the types handled? Are there libraries?
  • Grouping compiler passes: single- or multi-pass?
  • The degree of desired optimization. Fast and unclean translation of the program with little or no optimization can be normal. Excessive optimization will slow down the compiler, but the best code at runtime might be worth it.
  • Required error detection rate. Can the translator just stop on the first error? When should he stop? Do you trust the compiler to correct errors?
  • Availability of tools. If the source language is not very small, the scanner and analyzer generator are mandatory. There are also generators of code generators, but they are not so common.
  • Type of target code for generation. You should choose from pure, augmented, or virtual machine code. Or, just write an input that creates popular intermediate views, such as LLVM, RTL, or JVM. Or make a translation from source to source code in C or JavaScript.
  • The format of the target code. You can select the assembly language, portable machine code, machine code of the memory image.
  • Perenatselivanie. With a lot of generators it is good to have a common input part. For the same reason, it is better to have one generator for many input parts.

Compiler Architecture: Components

These are the main functional components of the translator that generates machine code (if the output program is a C program or a virtual machine, then not many steps are required):

  • The input program (a stream of signs) enters the scanner (lexical analyzer), which converts it into a stream of tokens.
  • The parser (parser) constructs an abstract syntax tree from them.
  • The semantic analyzer decomposes the semantic information and checks the nodes of the tree for errors. As a result, a semantic graph is constructed - an abstract syntax tree with additional properties and installed references.
  • The intermediate code generator builds a flow graph (tuples are grouped into main blocks).
  • The machine-independent code optimizer performs both local (within the base block) and global (for all blocks) optimization, basically remaining within the subroutines. Reduces redundant code and simplifies calculations. The result is a modified flow graph.
  • The target code generator links the base blocks to the straight-line code with the control transfer, creating an object file in assembler with virtual registers (possibly ineffective).
  • The machine-independent linker optimizer allocates memory between registers and schedules commands. Converts the program in assembler to a real assembler with good use of pipelining.

In addition, the error detection subsystems and the symbol table manager are used.

Lexical analysis (scanning)

The scanner converts the stream of the characters of the source code into a stream of tokens, removing spaces, comments and expanding macros.

Scanners often encounter problems such as accepting or not taking into account the register, indents, line feeds and nested comments.

Errors that may occur during scanning are called lexical and include:

  • Characters that are not in the alphabet;
  • Exceeding the number of characters in a word or string;
  • Not a closed character or string literal;
  • The end of the file in the comment.

Syntax analysis (parsing)

The parser converts the sequence of tokens into an abstract syntax tree. Each tree node is saved as an object with named fields, many of which are themselves nodes of the tree. There are no cycles at this stage. When creating a parser, you should pay attention to the level of complexity of the grammar (LL or LR) and to find out whether there are any rules for removing ambiguity. Some languages do require semantic analysis.

Errors encountered at this stage are called syntax errors. For example:

  • K = 5 * (7 - y;
  • J = / 5;
  • 56 = x * 4.

Semantic analysis

During the semantic analysis it is necessary to check the admissibility rules and bind the parts of the syntax tree (permitting references of names, inserting operations for implicit casting of types, etc.) to form a semantic graph.

Obviously, the set of admissibility rules for different languages is different. If Java-like languages are compiled, translators can find:

  • Multiple declarations of a variable within its scope;
  • Reference to a variable before its declaration;
  • References to the undeclared name;
  • Violation of accessibility rules;
  • Too large or insufficient number of arguments when calling the method;
  • Type mismatch.

Generation

The generation of the intermediate code produces a flow graph composed of tuples grouped into base blocks.

Code generation produces real machine code. In traditional compilers for RISC-machines, the first stage creates an assembler with an infinite number of virtual registers. For CISC machines, this probably will not happen.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 en.delachieve.com. Theme powered by WordPress.