ComputersDatabase

Work with text. How to determine the encoding of a file

Let's find out what the file encoding is. In simple terms, the encoding is a set of byte characters that corresponds to the alphabetic alphabet in a particular language. For each language, there is a specific sequence of such coding signs. Sometimes there is a need to determine the encoding. Consider this for an example of a text document.

What do you need

A set of certain software tools. To begin with, enough applications like Word , KWrite, Firefox and the recognition tool - enca .

You can determine the encoding of the file using the universal Microsoft Word editor. Before, it needs to be installed from the Office suite. When the application is installed and can be opened using the icon in the form of a W symbol on the desktop, go to the next step.

The next stage of recognition

Through the navigation bar of the application, open the "File" - "open" items one at a time. The same can be done by using the keyboard combination Ctrl + O.

Then in the dialog box, select the desired directory and, in fact, the file for reading. After selecting it with the mouse, press the "open" button.

When a file has a set of matches not CP1251 , the application tries to determine the encoding on its own. A list of possible matches will be displayed. In the proposed character sets on the right side of the list, select one of the encodings. If the choice is made correctly, the recognized text will be displayed in the "sample" element.

How to determine the encoding with KWrite

In addition to the preprocessor for word processing, Word, there are other functional utilities. One of them is KWrite (an analogue for unix-systems). So that you do not get confused, I'll write down the points on the task "to determine the encoding of the document in KWrite".

  1. Uploading an .txt file to the application.
  2. Retry the encodings until one of them is suitable.
  3. To perform step 2, go to the tools option in the encoding menu.

Browser Mozilla Firefox, the goal is the same - to determine the encoding

The principle is about the same as in utilities for working with text. Run the installed browser for execution, and if it is not installed - download the installer from mozilla.org.

Then in the open window of the program you need to open a text document through the "File" menu, the submenu "Open file". If the selected file is displayed without distortion, and the text is readable, it is not difficult to determine the encoding.

To do this, go to "View" - "encoding", there are several sets of characters displayed, and the one of them, opposite which is a "tick", and there is a browser-defined encoding.

If the text is not recognized correctly, select the "additional" subsection, experiment in it with encodings or select the value "auto".

Specialized software - working with enca

There is also a number of auxiliary electronic tools, which make it possible to determine the encoding of unformatted text.

For those who are accustomed to working under unix, the enca utility is suitable. It can be installed using the "Package Manager" service. Having found the available category of packages, you can start installing the software.

To list the recognition languages, execute the enca -list languages command using the terminal.

If you want to determine the encoding of a text file after the key (g), enter its name, and after the (L) option, in about the same way, enter the recognition language:

Enca -L russian -g /home/vic/temp/myfile.txt.

Let's sum up what was said about the encoding

I believe that the above utilities will provide the user with a sufficient set of tools for decoding text documents.

So far, in fact, it's all about how to recognize the encoding. For standard purposes, I think, the specified software is quite suitable. There are more specialized methods of definition, but their consideration is beyond the scope of this article.

For Microsoft Word, the source of recognition can be either plain text or a document with complex formatting.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 en.delachieve.com. Theme powered by WordPress.