Skip to main content

Encoding Detector Detect text file character encoding (UTF-8, UTF-16, ASCII, Latin-1).

Encoding Detector illustration
🔄

Encoding Detector

Detect text file character encoding (UTF-8, UTF-16, ASCII, Latin-1).

1

Upload text file

Drop or select a text file to analyze.

2

View encoding result

See detected encoding, BOM status, and confidence level.

3

Preview content

View a preview of the decoded text content.

Loading tool...

What Is Encoding Detector?

A character encoding detector is a tool that analyzes text files to determine their character encoding, which is crucial for ensuring correct interpretation of the file's contents. Developers working with text files use it to solve the specific problem of detecting the encoding of unknown or unmarked files, which can cause issues when reading or processing them.

It solves this problem by checking for Byte Order Marks (BOM) and using heuristic analysis based on byte patterns, such as null bytes and high-byte sequences, to identify encodings like UTF-8, UTF-16, and ISO-8859-1/Windows-1252. What makes the Encoding Detector different is its ability to provide a decoded content preview of up to 200 characters from the file, helping users verify the detected encoding.

The tool's detection process involves checking for BOMs in the first few bytes of the file, and if none are found, it analyzes the byte sequence to determine the likelihood of the file being encoded in UTF-8, UTF-16, or other formats. It also checks for common patterns like null bytes to detect UTF-16 encoding without a BOM. As a text encoding checker, it provides detailed results including the detected encoding, confidence level, and explanation of how the detection was made, making it an essential tool for detecting file encoding accurately.

Why Use Encoding Detector?

  • Detects encoding via BOM and heuristic byte analysis.
  • Supports UTF-8, UTF-16, UTF-32, ASCII, and Latin-1/Windows-1252.
  • Shows confidence level and detection method details.
  • Includes decoded content preview to verify detection accuracy.

Common Use Cases

Character Issues

Diagnose mojibake and character display issues by identifying the correct file encoding.

Data Import

Determine file encoding before importing text data to ensure correct character handling.

Legacy Files

Identify encoding of legacy text files that may use non-UTF-8 encodings.

Development

Verify encoding of source code files, CSV data, and configuration files.

Technical Guide

The detector uses a multi-stage approach:

1. BOM Detection: Checks the first 4 bytes for known BOM sequences (UTF-8: EF BB BF, UTF-16 LE: FF FE, UTF-16 BE: FE FF, UTF-32 LE: FF FE 00 00, UTF-32 BE: 00 00 FE FF). BOM presence provides high-confidence detection.

2. UTF-16 Heuristic: Analyzes null byte patterns. UTF-16 files have frequent null bytes in even or odd positions corresponding to ASCII characters encoded in 16-bit.

3. UTF-8 Validation: Validates multi-byte sequences. Valid UTF-8 has specific patterns: 110xxxxx 10xxxxxx for 2-byte, 1110xxxx 10xxxxxx 10xxxxxx for 3-byte, etc.

4. ASCII Detection: If all bytes are in the 0x00-0x7F range, the file is pure ASCII (which is also valid UTF-8).

5. Latin-1 Fallback: If bytes exist in the 0x80-0xFF range but don't form valid UTF-8 sequences, ISO-8859-1/Windows-1252 is likely.

Only the first 8KB of the file is analyzed for performance.

Tips & Best Practices

  • 1
    BOM detection provides the highest confidence — files with BOM are definitively identified.
  • 2
    UTF-8 without BOM is detected by validating multi-byte sequences.
  • 3
    ISO-8859-1 and Windows-1252 are detected as a fallback when UTF-8 validation fails.
  • 4
    The content preview helps verify the detection is correct — look for garbled characters.

Related Tools

Frequently Asked Questions

Q How accurate is the detection?
BOM-based detection is 100% accurate. Heuristic detection for UTF-8 is very reliable. Latin-1/Windows-1252 detection is a fallback.
Q What is a BOM?
A Byte Order Mark is a special byte sequence at the start of a file that identifies its encoding.
Q Can it detect Shift-JIS or GB2312?
Currently, the detector focuses on Unicode encodings and Latin-1. East Asian encodings are not specifically detected.
Q How much of the file is analyzed?
The first 8KB (8192 bytes) are analyzed, which is sufficient for reliable encoding detection.
Q What about mixed encoding files?
The detector assumes a single encoding per file. Mixed encoding files will show the dominant encoding.

About This Tool

Encoding Detector is a free online tool by FreeToolkit.ai. All processing happens directly in your browser — your data never leaves your device. No registration or installation required.