Character encoding (Generation I): Difference between revisions

From Bulbapedia, the community-driven Pokémon encyclopedia.
Jump to navigationJump to search
mNo edit summary
Line 4: Line 4:
Fixed-length user-input strings are terminated with 0x50. If a fixed-length string is terminated before using its full capacity, the contents of the remaining space are not specified.
Fixed-length user-input strings are terminated with 0x50. If a fixed-length string is terminated before using its full capacity, the contents of the remaining space are not specified.


==Character sets==
Note that 0x7F is a space (" "), not empty. All characters that are not control characters print in one character.
Note that 0x7F is a space (" "), not empty. All characters that are not control characters print in one character.


In some contexts, some characters may display differently than suggested below. For example, in the character input table, <sup>E</sup><sub>D</sub> is 0xF0 instead of the [[Pokémon Dollar]] symbol, and in the Pokédex (in English), the feet (') and inches (") marks are 0x60 and 0x61.
In some contexts, some characters may display differently than suggested below. For example, in the character input table, <sup>E</sup><sub>D</sub> is 0xF0 instead of the [[Pokémon Dollar]] symbol, and in the Pokédex (in English), the feet (') and inches (") marks are 0x60 and 0x61.


===English===
==English==
Bytes with a dark gray background are not used normally in the English games. Characters with a light gray background are holdovers from the Japanese game but that are not used in the English game.
Bytes with a dark gray background are not used normally in the English games. Characters with a light gray background are holdovers from the Japanese game but that are not used in the English game.
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
Line 66: Line 65:
The full list of characters that are available for user input are: A-Z and a-z, space, and the following: <code>×():;[]<sup>P</sup><sub>K</sub><sup>M</sup><sub>N</sub>-?!♂♀/.,</code>.
The full list of characters that are available for user input are: A-Z and a-z, space, and the following: <code>×():;[]<sup>P</sup><sub>K</sub><sup>M</sup><sub>N</sub>-?!♂♀/.,</code>.


====Tilemap sections====
===Tilemap sections===
The game sections off various areas of the tilemap loaded into {{wp|VRAM}} and each character code directly corresponds to a tile in the tilemap. Not all tiles in the tilemap are accessible via character code, but many are.
The game sections off various areas of the tilemap loaded into {{wp|VRAM}} and each character code directly corresponds to a tile in the tilemap. Not all tiles in the tilemap are accessible via character code, but many are.


Line 77: Line 76:
## The range 0xE0-0xFF includes numbers, some symbols, and more user interface characters. The player-enterable characters {{PK}}, {{MN}}, and gender symbols are also stored here.
## The range 0xE0-0xFF includes numbers, some symbols, and more user interface characters. The player-enterable characters {{PK}}, {{MN}}, and gender symbols are also stored here.


====Character codes====
===Character codes===
Character codes are within the 0x49-0x5F range, with the exception of 0x4D which defaults to tile 4D.
Character codes are within the 0x49-0x5F range, with the exception of 0x4D which defaults to tile 4D.


Control characters work by intercepting the tile that would normally correspond to the control character and instead perform a different action whether it be end the text or print a lengthy message.
Control characters work by intercepting the tile that would normally correspond to the control character and instead perform a different action whether it be end the text or print a lengthy message.


=====Dialogue control codes=====
====Dialogue control codes====
These control codes control dialogue text placement, paging, etc.
These control codes control dialogue text placement, paging, etc.


Line 97: Line 96:
* 0x5F - "dex" - Displays a period and ends the Pokédex entry
* 0x5F - "dex" - Displays a period and ends the Pokédex entry


=====Variable control codes=====
====Variable control codes====
These control codes print text defined elsewhere.
These control codes print text defined elsewhere.


Line 105: Line 104:
* 0x5A - "user" - In battle, the user of a move. Just like "target", "Enemy " will be prepended to the name of opposing Pokémon.
* 0x5A - "user" - In battle, the user of a move. Just like "target", "Enemy " will be prepended to the name of opposing Pokémon.


=====Text control codes=====
====Text control codes====
These control codes print a hardcoded string. They are used to decrease the number of bytes to write common strings while still rendering as the correct number of characters.
These control codes print a hardcoded string. They are used to decrease the number of bytes to write common strings while still rendering as the correct number of characters.


Line 116: Line 115:
* 0x5E - "rocket" - Prints "ROCKET"
* 0x5E - "rocket" - Prints "ROCKET"


===French & German===
==French & German==
 
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
|-
|-
Line 169: Line 167:
|}
|}


===Italian & Spanish===
==Italian & Spanish==
 
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
|-
|-
Line 225: Line 222:
The lowercase 'm' (0xAC) in the French, German, Italian & Spanish version is stylized differently compared to the English version.
The lowercase 'm' (0xAC) in the French, German, Italian & Spanish version is stylized differently compared to the English version.


===Japanese===
==Japanese==
Technically all characters under 0x60 are control characters, the majority of which have the behavior of causing a specific character from the main font (0x80-0xFF) to be printed with a diacritic in the space above it. Those characters that have different, more complicated functions are detailed below.
Technically all characters under 0x60 are control characters, the majority of which have the behavior of causing a specific character from the main font (0x80-0xFF) to be printed with a diacritic in the space above it. Those characters that have different, more complicated functions are detailed below.



Revision as of 10:15, 12 April 2020

050Diglett.png This article is incomplete.
Please feel free to edit this article to add missing information and complete it.
Reason: French, German, Italian, and Spanish character encodings

The Generation I games use a proprietary character encoding to store text data. Versions of the games in different languages may use different encodings, some more different than others.

Fixed-length user-input strings are terminated with 0x50. If a fixed-length string is terminated before using its full capacity, the contents of the remaining space are not specified.

Note that 0x7F is a space (" "), not empty. All characters that are not control characters print in one character.

In some contexts, some characters may display differently than suggested below. For example, in the character input table, ED is 0xF0 instead of the Pokémon Dollar symbol, and in the Pokédex (in English), the feet (') and inches (") marks are 0x60 and 0x61.

English

Bytes with a dark gray background are not used normally in the English games. Characters with a light gray background are holdovers from the Japanese game but that are not used in the English game.

-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
0- NULL
1- Junk
2-
3-
4- Control characters
5- Control characters
6- A B C D E F G H I V S L M :
7- Text box borders
8- A B C D E F G H I J K L M N O P
9- Q R S T U V W X Y Z ( ) : ; [ ]
A- a b c d e f g h i j k l m n o p
B- q r s t u v w x y z é 'd 'l 's 't 'v
C- Junk
D-
E- ' PK MN - 'r 'm ? ! .
F- $ × . / , 0 1 2 3 4 5 6 7 8 9

In the Japanese games (as can be seen below), 0xF2 is distinguishable from 0xE8, with the former meant as a decimal point while the latter is punctuation. Presumably this intention was largely inherited when the English games were made, as most of the game's script uses 0xE8 exclusively; however, 0xF2 appears in the character table for user input, meaning it may appear in user-input names (and, conversely, 0xE8 never should).

The full list of characters that are available for user input are: A-Z and a-z, space, and the following: ×():;[]PKMN-?!♂♀/.,.

Tilemap sections

The game sections off various areas of the tilemap loaded into VRAM and each character code directly corresponds to a tile in the tilemap. Not all tiles in the tilemap are accessible via character code, but many are.

  1. VRAM addresses 0x9000 to 0x9480 correspond to a portion of the current tileset of the map. Character codes 0x01 to 0x48 and 0x4D directly correspond to them. For example, while the player is outside, tile #3 is the animated flower so character code 0x03 will place the animated flower in text, but in other locations (such as in battle or in a cave), a completely different tile will be displayed.
    1. Characters 0x49 - 0x5F are also in this same section, but with the exception of 0x4D, they are control characters that link to code rather than the tile they would normally correspond to.
  2. VRAM addresses 0x9600 to 0x97F0 partially corresponds to characters 0x60-0x7F. This is where the user interface tiles are stored, such as bold letters and tiles that are used to draw borders for text boxes and menus. The space character is also in this range. These tiles can sometimes change, meaning that characters that reference them may print out a different tile image; however, they are far more consistent than tiles in the 0x9000 to 0x9480 range.
  3. VRAM addresses 0x8800 to 0x8BF0 corresponds to characters 0x80-0xBF. This is where the main font is placed when rendering text.
  4. VRAM addresses 0x8C00 to 0x8DF0 are split into 2 tile sections:
    1. The range 0xC0-0xDF is reserved for certain areas that need extra space for extra tiles. As such, they are usually unoccupied, so normally only print blank characters. The player info screen is an example of a screen that uses some of this space.
    2. The range 0xE0-0xFF includes numbers, some symbols, and more user interface characters. The player-enterable characters PK, MN, and gender symbols are also stored here.

Character codes

Character codes are within the 0x49-0x5F range, with the exception of 0x4D which defaults to tile 4D.

Control characters work by intercepting the tile that would normally correspond to the control character and instead perform a different action whether it be end the text or print a lengthy message.

Dialogue control codes

These control codes control dialogue text placement, paging, etc.

  • 0x49 - "page" - Begins a new Pokedex page
  • 0x4B - "_cont"- Stops and waits for confirmation before scrolling the dialogue down by 1
  • 0x4C - "autocont" - Scroll dialogue down 1 without waiting for confirmation
  • 0x4E - "next line" - Move a line down in dialogue
  • 0x4F - "bottom line" - Write at the last line of dialogue
  • 0x50 - "end" - Marks the end of a string
  • 0x51 - "paragraph" - Begin a new dialogue page with button confirmation
  • 0x55 - "cont" - A variation of 0x4B and 0x4C
  • 0x57 - "done" - Ends text box
  • 0x58 - "prompt" - Prompts to end textbox
  • 0x5F - "dex" - Displays a period and ends the Pokédex entry

Variable control codes

These control codes print text defined elsewhere.

  • 0x52 - "players name" - The player's name
  • 0x53 - "rivals name" - The rival's name
  • 0x59 - "target" - In battle, the target of a move. If the dialogue is referring to the opponent's Pokémon, "Enemy " will be prepended to the Pokémon's name; if referring to the player's Pokémon, it will just display the Pokémon's name. Outside of battle, it will retain the last value that was stored in it in-battle.
  • 0x5A - "user" - In battle, the user of a move. Just like "target", "Enemy " will be prepended to the name of opposing Pokémon.

Text control codes

These control codes print a hardcoded string. They are used to decrease the number of bytes to write common strings while still rendering as the correct number of characters.

  • 0x4A - "pkmn" - Prints "PKMN"
  • 0x54 - "poke" - Prints "Poké"
  • 0x56 - "......" - Prints 2 ellipses, "……"
  • 0x5B - "pc" - Prints "PC"
  • 0x5C - "tm" - Prints "TM"
  • 0x5D - "trainer" - Prints "TRAINER"
  • 0x5E - "rocket" - Prints "ROCKET"

French & German

-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
0-
1- Unsure
2-
3-
4-
5-
6- A B C D E F G H I V S L M :
7- Text box borders
8- A B C D E F G H I J K L M N O P
9- Q R S T U V W X Y Z ( ) : ; [ ]
A- a b c d e f g h i j k l m n o p
B- q r s t u v w x y z à è é ù ß ç
C- Ä Ö Ü ä ö ü ë ï â ô û ê î
D- c' d' j' l' m' n' p' s' 's t' u' y'
E- ' PK MN - + ? ! .
F- $ × . / , 0 1 2 3 4 5 6 7 8 9

Italian & Spanish

-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
0-
1- Unsure
2-
3-
4-
5-
6- A B C D E F G H I V S L M :
7- Text box borders
8- A B C D E F G H I J K L M N O P
9- Q R S T U V W X Y Z ( ) : ; [ ]
A- a b c d e f g h i j k l m n o p
B- q r s t u v w x y z à è é ù À Á
C- Ä Ö Ü ä ö ü È É Ì Í Ñ Ò Ó Ù Ú á
D- ì í ñ ò ó ú º & 'd 'l 'm 'r 's 't 'v
E- ' PK MN - ¿ ¡ ? ! .
F- $ × . / , 0 1 2 3 4 5 6 7 8 9


The lowercase 'm' (0xAC) in the French, German, Italian & Spanish version is stylized differently compared to the English version.

Japanese

Technically all characters under 0x60 are control characters, the majority of which have the behavior of causing a specific character from the main font (0x80-0xFF) to be printed with a diacritic in the space above it. Those characters that have different, more complicated functions are detailed below.

-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
0- NULL イ゙ エ゙ オ゙
1- ナ゙ ニ゙ ヌ゙ ネ゙ ノ゙ マ゙ ミ゙ ム゙
2- ィ゙ あ゙ い゙ え゙ お゙
3- な゙ に゙ ぬ゙ ね゙ の゙ ま゙
4- ま゚ Control も゚ Control
5- Control characters
6- A B C D E F G H I V S L M
7- Text box borders
8-
9-
A-
B-
C-
D-
E- ? !
F- × . / 0 1 2 3 4 5 6 7 8 9

0xE4 and 0xE5 cause the following character to be printed with that diacritic above it.

Japanese control characters

050Diglett.png This section is incomplete.
Please feel free to edit this section to add missing information and complete it.
Reason: Incomplete or missing functions for control bytes. Alternate defaults in different games/other languages
  • 0x4A: Prints
  • 0x52: Prints the player's name.
  • 0x53: Prints the rival's name.
  • 0x54: Prints ポケモン in Japanese games.
  • 0x59: Prints the inactive Pokémon's name in battle. (In specific circumstances, the game may "pretend" that the inactive Pokémon is actually active and vice versa.)
    • てきの  in Japanese games.
  • 0x5A: Prints the active Pokémon's name in battle. The default value is empty. (In specific circumstances, the game may "pretend" that the active Pokémon is actually inactive and vice versa.)
  • 0x5B: Prints パソコン in Japanese games.
  • 0x5C: Prints わざマシン in Japanese games.
  • 0x5D: Prints トレーナー in Japanese games.
  • 0x5E: Prints ロケットだん in Japanese games.


Data structure in the Pokémon games
General Character encoding
Generation I Pokémon speciesPokémonPoké MartCharacter encodingSave
Generation II Pokémon speciesPokémonTrainerCharacter encoding (Korean) • Save
Generation III Pokémon species (EvolutionPokédexType chart)
Pokémon (substructures) • MoveContestContest moveItem
Trainer TowerBattle FrontierCharacter encoding (GameCube) • Save
Generation IV Pokémon species (EvolutionLearnsets)
PokémonSaveCharacter encoding (Wii)
Generation V–present Character encoding
Generation VIII Save
TCG GB and GB2 Character encoding


Project Games logo.png This data structure article is part of Project Games, a Bulbapedia project that aims to write comprehensive articles on the Pokémon games.