Character encoding (Generation I): Difference between revisions

From Bulbapedia, the community-driven Pokémon encyclopedia.
Jump to navigationJump to search
(Spanish definitely includes diacritic characters not listed in the English table)
(Expanded completely on the English text engine)
Line 10: Line 10:


===English===
===English===
Those bytes with a dark gray background are not used in the English games and may contain junk data that may cause unexpected behavior. Characters with a light gray background are holdovers from the Japanese game that still print but that are not used in the English game.
 
====Mechanics====
The game sections off various areas of the tilemap loaded into vram and each character code directly corresponds to a tile in the tilemap. Not all tiles in the tilemap are accessible via character code but many are.
 
Control characters work by intercepting the tile that would normally correspond to the control character and instead perform a different action whether it be end the text or print a lengthy message.
 
====Tilemap Sections====
 
# VRAM address 0x9000 to 0x9480 correspond to a portion of the current tileset of the map. Character codes 0x01 to 0x48 and 0x4D directly correspond to them. For example, when your outdoors, tile #3 is the animated flower meaning character code 0x3 will place the animated flower in text, however if your anywhere else such as in battle, in a cave, or elsewhere a completely different tile will likely print.
## Characters 0x49 - 0x5F technically are also in this same section but, apart from one, 0x4D, all the rest are control character and thus link to code rather than the tile they would normally correspond to
# VRAM address 0x9600 to 0x97F0 partially corresponds to character codes 0x60-0x7F, here is where the "UI" tiles are such as random bold letters or border artwork for the dialogs and menus. The space character is also here. Tiles here can sometimes change meaning characters that reference them may print out a different tile image but they are far more consistent than the first section mentioned above
# VRAM address 0x8800 to 0x8BF0 corresponds to characters 0x80 - 0xBF is where the main font is placed when text is needed to render
# VRAM address 0x8C00 to 0x8DF0 has 2 tile sections
## 0xC0 - 0xDF is one that appears to be reserved only for certain areas that need extra space for extra tiles, they would go here. As such, most of the time nothings there meaning only blank characters print out. The player info screen is one such example that uses only some of this area and thus any character codes that reference these tiles.
## 0xE0 - 0xFF reference tiles similar to section 2, you can consider these the "other half" of that section although some player-typeable characters like "PK", "MN", gender symbols, etc are here as well as numbers, some symbols, and more ui characters
 
====Character Codes====
 
As mentioned above, character codes are within the 0x49-0x5F range with the exception of 0x4D which doesn't map to any code and thus, by default, corresponds to tile 4D. All of these are completely useable in game such as names, testing never showed any crashing, however if done so expect to have some small to large graphical glitches that will usually be cleaned up by changing screens or entering a new map through a warp and definitely some annoyances if used long-term.
 
=====Dialogue control codes=====
 
These control codes control dialogue text placement, paging, etc... they can be used in names but will have various temporary graphical glitches
 
* 0x49 - "page" - Begins a new Pokedex page, if used in a name it causes the user to have to press a button to continue displaying rest of text and has some serious graphical glitches that can be easily cleared as normal
* 0x4B - "_cont"- Stops and waits for confirmation before scrolling the dialogue down by 1, in names it's same as 0x49 but with slightly less graphical glitches
* 0x4C - "autocont" - Scroll dialogue down 1 without waiting for confirmation, a less annoying but still graphical glitchy as 0x4B when used in names
* 0x4E - "next line" - Move a line down in dialogue, causes just that when used in names - all dialogue moves 1 line down as soon as it hits your name causing weird graphical glitches and text being overwritten or off the screen
* 0x4F - "bottom line" - Write at the last line of dialogue, in names it causes graphics, particularly dialogue text, to get quirky and overlap
* 0x50 - "end" - Used all the time, even in names, just marks the end and nothings read afterwards. On the contrary removing 0x50 will cause the text engine to proceed on until it does reach 0x50 or, in certain cases of player/rival names, total crashing of the game if it reaches a variable that tells it to insert their own name causing an infinite loop at that point.
* 0x51 - "paragraph" - Begin a new dialogue page with button confirmation, if used in names will do exactly that, will cause graphical glitches, dialogue text overlapping and large annoyances
* 0x55 - "cont" - A variation of 0x4B and 0x4C
* 0x57 - "done" - ends text box, in names causes various graphical glitches
* 0x58 - "prompt" - Prompts to end textbox, in names similar to 0x4B, 0x4C, and 0x55
* 0x5F - "dex" - Ends a Pokédex Entry, it's just expands to a period "." and that's it but it's only used normally at the end of Pokédex entries
 
=====Variable control codes=====
 
These simply expand out to text of their own that can vary or based on other variables, they're perfectly safe to use in names without any graphical glitches however since it expands to larger text you can quickly have dialogue or text spilling over the edges of the container which would just temporarily clutter the screen and may overwrite or overlap other text being printed.
 
* 0x52 - "players name" - Insert the players name, the only variable you cannot use at all in the players name since it will lead to an infinite loop that crashes the game safe and fun elsewhere, try it on a Pokémon's name or your rival.
* 0x53 - "rivals name" - Inverse of players name, prints rivals name instead and cannot be used in the rivals name, great to place in your HM slave's name though
* 0x59 - "target" - Inserts target name, this is essentially the Pokémon from your perspective. If the dialogue is referring to the enemies Pokemon that name will be inserted with "Enemy " prepended before it, if it's your Pokémon then it will just be your Pokemon name. The last Pokemon you fought is kept in memory so if used in names it will still work even out of battle. This is the longest control character in the game and will print far off the screen in all cases. It can expand up to 16 characters with this single control character alone.
* 0x5A - "user" - The inverse of "target", the Pokemon from the enemies perspective. If used in names will likely just be the enemy you fought without the "Enemy " prefix
 
=====Text control codes=====
 
These are like variable control codes but always remain consistent and can never change
 
* 0x4A - "pkmn" - Prints "PK" and "MN" using only one character code, can be used in names to surpass the 7 or 10 printed character limit while not going over the space limits
* 0x54 - "poke" - Prints the characters "Poké" while taking up only 1 byte of space, can be used in names to print more characters past the 7 or 10 character limit and still fit within
* 0x56 - "......" - Print 2 characters consisting of 3 dots each on the screen
* 0x5B - "pc" - prints "PC" as 2 tiles
* 0x5C - "tm" - Prints "TM" as 2 tiles
* 0x5D - "trainer" - prints "TRAINER" as individual tiles
* 0x5E - "rocket" - prints "ROCKET" as individual tiles
 
Those bytes with a dark gray background are not used normally in the English games. Characters with a light gray background are holdovers from the Japanese game but that are not used in the English game.
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
|-
|-
Line 123: Line 180:
0xE4 and 0xE5 cause the following character to be printed with that diacritic above it.
0xE4 and 0xE5 cause the following character to be printed with that diacritic above it.


===Control characters===
===Japanese Control characters===
{{incomplete|section|Incomplete or missing functions for control bytes. Alternate defaults in different games/other languages}}
{{incomplete|section|Incomplete or missing functions for control bytes. Alternate defaults in different games/other languages}}
* 0x49: Used in Pokédex entries to prompt the player to press a button, after which the screen is cleared to make way for the following text.
* 0x4A: Prints <code>が </code>
* 0x4A: Prints <code><sup>P</sup><sub>K</sub><sup>M</sup><sub>N</sub></code> in English games and <code>が </code> in Japanese games.
* 0x4B: ?
* 0x4C: ?
* 0x4E: Used as a line break in Pokédex entries.
* 0x4F: Line break (print position moves to the bottom of the text window).
* 0x50: A string terminator.
* 0x51: Prompts the player to press a button, after which the text window is cleared to make way for the following text.
* 0x52: Prints the player's name.
* 0x52: Prints the player's name.
** In {{game|Yellow}}, the default value is <code><sc>NINTEN</sc></code> in English games and <code>ゲーフリ1</code> in Japanese games.
** In {{game|Yellow}}, the default value is <code>ゲーフリ1</code> in Japanese games.
* 0x53: Prints the  rival's name.
* 0x53: Prints the  rival's name.
** In {{game|Yellow}}, the default value is <code><sc>SONY</sc></code> in English games and <code>クリチャ</code> in Japanese games.
** In {{game|Yellow}}, the default value is<code>クリチャ</code> in Japanese games.
* 0x54: Prints <code>POKé</code> in English games and <code>ポケモン</code> in Japanese games.
* 0x54: Prints <code>ポケモン</code> in Japanese games.
* 0x55: Prompts the player to press a button, after which the top line of the text window is replaced by the bottom, the bottom line is cleared, and the print position moves to the start of the bottom line.
* 0x56: Prints <code>……</code>.
* 0x57: Marks the end of dialogue, without a visual prompt to the player.
* 0x58: Marks the end of dialogue, with a visual prompt to the player.
* 0x59: Prints the inactive Pokémon's name in battle. (In specific circumstances<!--E.g., Rage-->, the game may "pretend" that the inactive Pokémon is actually active and vice versa.)
* 0x59: Prints the inactive Pokémon's name in battle. (In specific circumstances<!--E.g., Rage-->, the game may "pretend" that the inactive Pokémon is actually active and vice versa.)
** The default value is <code>Enemy </code> in English games and <code>てきの </code> in Japanese games.
** <code>てきの </code> in Japanese games.
* 0x5A: Prints the active Pokémon's name in battle. The default value is empty. (In specific circumstances, the game may "pretend" that the active Pokémon is actually inactive and vice versa.)
* 0x5A: Prints the active Pokémon's name in battle. The default value is empty. (In specific circumstances, the game may "pretend" that the active Pokémon is actually inactive and vice versa.)
* 0x5B: Prints <code>PC</code> in English games and <code>パソコン</code> in Japanese games.
* 0x5B: Prints <code>パソコン</code> in Japanese games.
* 0x5C: Prints <code>TM</code> in English games and <code>わざマシン</code> in Japanese games.
* 0x5C: Prints <code>わざマシン</code> in Japanese games.
* 0x5D: Prints <code>TRAINER</code> in English games and <code>トレーナー</code> in Japanese games.
* 0x5D: Prints <code>トレーナー</code> in Japanese games.
* 0x5E: Prints <code>ROCKET</code> in English games and <code>ロケットだん</code> in Japanese games.
* 0x5E: Prints <code>ロケットだん</code> in Japanese games.
* 0x5F: Used in Pokédex entries to mark the end of the entry, without a visual prompt to the player.


{{data structure}}<br>
{{data structure}}<br>
{{Project Games notice|data structure}}
{{Project Games notice|data structure}}

Revision as of 00:11, 15 August 2018

050Diglett.png This article is incomplete.
Please feel free to edit this article to add missing information and complete it.
Reason: French, German, Italian, and Spanish character encodings

The Generation I games use a proprietary character encoding to store text data. Versions of the games in different languages may use different encodings, some more different than others.

Fixed-length user-input strings are terminated with 0x50. If a fixed-length string is terminated before using its full capacity, the contents of the remaining space are not specified.

Character sets

Note that 0x7F is a space (" "), not empty. All characters that are not control characters print in one character.

In some contexts, some characters may display differently than suggested below. For example, in the character input table, ED is 0xF0 instead of the Pokémon Dollar symbol, and in the Pokédex (in English), the feet (') and inches (") marks are 0x60 and 0x61.

English

Mechanics

The game sections off various areas of the tilemap loaded into vram and each character code directly corresponds to a tile in the tilemap. Not all tiles in the tilemap are accessible via character code but many are.

Control characters work by intercepting the tile that would normally correspond to the control character and instead perform a different action whether it be end the text or print a lengthy message.

Tilemap Sections

  1. VRAM address 0x9000 to 0x9480 correspond to a portion of the current tileset of the map. Character codes 0x01 to 0x48 and 0x4D directly correspond to them. For example, when your outdoors, tile #3 is the animated flower meaning character code 0x3 will place the animated flower in text, however if your anywhere else such as in battle, in a cave, or elsewhere a completely different tile will likely print.
    1. Characters 0x49 - 0x5F technically are also in this same section but, apart from one, 0x4D, all the rest are control character and thus link to code rather than the tile they would normally correspond to
  2. VRAM address 0x9600 to 0x97F0 partially corresponds to character codes 0x60-0x7F, here is where the "UI" tiles are such as random bold letters or border artwork for the dialogs and menus. The space character is also here. Tiles here can sometimes change meaning characters that reference them may print out a different tile image but they are far more consistent than the first section mentioned above
  3. VRAM address 0x8800 to 0x8BF0 corresponds to characters 0x80 - 0xBF is where the main font is placed when text is needed to render
  4. VRAM address 0x8C00 to 0x8DF0 has 2 tile sections
    1. 0xC0 - 0xDF is one that appears to be reserved only for certain areas that need extra space for extra tiles, they would go here. As such, most of the time nothings there meaning only blank characters print out. The player info screen is one such example that uses only some of this area and thus any character codes that reference these tiles.
    2. 0xE0 - 0xFF reference tiles similar to section 2, you can consider these the "other half" of that section although some player-typeable characters like "PK", "MN", gender symbols, etc are here as well as numbers, some symbols, and more ui characters

Character Codes

As mentioned above, character codes are within the 0x49-0x5F range with the exception of 0x4D which doesn't map to any code and thus, by default, corresponds to tile 4D. All of these are completely useable in game such as names, testing never showed any crashing, however if done so expect to have some small to large graphical glitches that will usually be cleaned up by changing screens or entering a new map through a warp and definitely some annoyances if used long-term.

Dialogue control codes

These control codes control dialogue text placement, paging, etc... they can be used in names but will have various temporary graphical glitches

  • 0x49 - "page" - Begins a new Pokedex page, if used in a name it causes the user to have to press a button to continue displaying rest of text and has some serious graphical glitches that can be easily cleared as normal
  • 0x4B - "_cont"- Stops and waits for confirmation before scrolling the dialogue down by 1, in names it's same as 0x49 but with slightly less graphical glitches
  • 0x4C - "autocont" - Scroll dialogue down 1 without waiting for confirmation, a less annoying but still graphical glitchy as 0x4B when used in names
  • 0x4E - "next line" - Move a line down in dialogue, causes just that when used in names - all dialogue moves 1 line down as soon as it hits your name causing weird graphical glitches and text being overwritten or off the screen
  • 0x4F - "bottom line" - Write at the last line of dialogue, in names it causes graphics, particularly dialogue text, to get quirky and overlap
  • 0x50 - "end" - Used all the time, even in names, just marks the end and nothings read afterwards. On the contrary removing 0x50 will cause the text engine to proceed on until it does reach 0x50 or, in certain cases of player/rival names, total crashing of the game if it reaches a variable that tells it to insert their own name causing an infinite loop at that point.
  • 0x51 - "paragraph" - Begin a new dialogue page with button confirmation, if used in names will do exactly that, will cause graphical glitches, dialogue text overlapping and large annoyances
  • 0x55 - "cont" - A variation of 0x4B and 0x4C
  • 0x57 - "done" - ends text box, in names causes various graphical glitches
  • 0x58 - "prompt" - Prompts to end textbox, in names similar to 0x4B, 0x4C, and 0x55
  • 0x5F - "dex" - Ends a Pokédex Entry, it's just expands to a period "." and that's it but it's only used normally at the end of Pokédex entries
Variable control codes

These simply expand out to text of their own that can vary or based on other variables, they're perfectly safe to use in names without any graphical glitches however since it expands to larger text you can quickly have dialogue or text spilling over the edges of the container which would just temporarily clutter the screen and may overwrite or overlap other text being printed.

  • 0x52 - "players name" - Insert the players name, the only variable you cannot use at all in the players name since it will lead to an infinite loop that crashes the game safe and fun elsewhere, try it on a Pokémon's name or your rival.
  • 0x53 - "rivals name" - Inverse of players name, prints rivals name instead and cannot be used in the rivals name, great to place in your HM slave's name though
  • 0x59 - "target" - Inserts target name, this is essentially the Pokémon from your perspective. If the dialogue is referring to the enemies Pokemon that name will be inserted with "Enemy " prepended before it, if it's your Pokémon then it will just be your Pokemon name. The last Pokemon you fought is kept in memory so if used in names it will still work even out of battle. This is the longest control character in the game and will print far off the screen in all cases. It can expand up to 16 characters with this single control character alone.
  • 0x5A - "user" - The inverse of "target", the Pokemon from the enemies perspective. If used in names will likely just be the enemy you fought without the "Enemy " prefix
Text control codes

These are like variable control codes but always remain consistent and can never change

  • 0x4A - "pkmn" - Prints "PK" and "MN" using only one character code, can be used in names to surpass the 7 or 10 printed character limit while not going over the space limits
  • 0x54 - "poke" - Prints the characters "Poké" while taking up only 1 byte of space, can be used in names to print more characters past the 7 or 10 character limit and still fit within
  • 0x56 - "......" - Print 2 characters consisting of 3 dots each on the screen
  • 0x5B - "pc" - prints "PC" as 2 tiles
  • 0x5C - "tm" - Prints "TM" as 2 tiles
  • 0x5D - "trainer" - prints "TRAINER" as individual tiles
  • 0x5E - "rocket" - prints "ROCKET" as individual tiles

Those bytes with a dark gray background are not used normally in the English games. Characters with a light gray background are holdovers from the Japanese game but that are not used in the English game.

-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
0- NULL
1- Junk
2-
3-
4- Control characters
5- Control characters
6- A B C D E F G H I V S L M :
7- Character 0x79 i.png = Character 0x7B i.png || Character 0x7D i.png Character 0x7E i.png
8- A B C D E F G H I J K L M N O P
9- Q R S T U V W X Y Z ( ) : ; [ ]
A- a b c d e f g h i j k l m n o p
B- q r s t u v w x y z é 'd 'l 's 't 'v
C- Junk
D-
E- ' PK MN - 'r 'm ? ! .
F- $ × . / , 0 1 2 3 4 5 6 7 8 9

In the Japanese games (as can be seen below), 0xF2 is distinguishable from 0xE8, with the former meant as a decimal point while the latter is punctuation. Presumably this intention was largely inherited when the English games were made, as most of the game's script uses 0xE8 exclusively; however, 0xF2 appears in the character table for user input, meaning it may appear in user-input names (and, conversely, 0xE8 never should).

The full list of characters that are available for user input are: A-Z and a-z, space, and the following: ×():;[]PKMN-?!♂♀/.,.

Japanese

Technically all characters under 0x60 are control characters, the majority of which have the behavior of causing a specific character from the main font (0x80-0xFF) to be printed with a diacritic in the space above it. Those characters that have different, more complicated functions are detailed below.

-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
0- NULL イ゛ エ゛ オ゛
1- ナ゛ ニ゛ ヌ゛ ネ゛ ノ゛ マ゛ ミ゛ ム゛
2- ィ゛ あ゛ い゛ え゛ お゛
3- な゛ に゛ ぬ゛ ね゛ の゛ ま゛
4- ま゜ Control も゜ Control
5- Control characters
6- A B C D E F G H I V S L M
7- Character 0x79 i.png = Character 0x7B i.png || Character 0x7D i.png Character 0x7E i.png  
8-
9-
A-
B-
C-
D-
E- ? !
F- × . / 0 1 2 3 4 5 6 7 8 9

0xE4 and 0xE5 cause the following character to be printed with that diacritic above it.

Japanese Control characters

050Diglett.png This section is incomplete.
Please feel free to edit this section to add missing information and complete it.
Reason: Incomplete or missing functions for control bytes. Alternate defaults in different games/other languages
  • 0x4A: Prints
  • 0x52: Prints the player's name.
  • 0x53: Prints the rival's name.
  • 0x54: Prints ポケモン in Japanese games.
  • 0x59: Prints the inactive Pokémon's name in battle. (In specific circumstances, the game may "pretend" that the inactive Pokémon is actually active and vice versa.)
    • てきの  in Japanese games.
  • 0x5A: Prints the active Pokémon's name in battle. The default value is empty. (In specific circumstances, the game may "pretend" that the active Pokémon is actually inactive and vice versa.)
  • 0x5B: Prints パソコン in Japanese games.
  • 0x5C: Prints わざマシン in Japanese games.
  • 0x5D: Prints トレーナー in Japanese games.
  • 0x5E: Prints ロケットだん in Japanese games.


Data structure in the Pokémon games
General Character encoding
Generation I Pokémon speciesPokémonPoké MartCharacter encodingSave
Generation II Pokémon speciesPokémonTrainerCharacter encoding (Korean) • Save
Generation III Pokémon species (Pokémon evolutionPokédexType chart)
Pokémon (substructures) • MoveContestContest moveItem
Trainer TowerBattle FrontierCharacter encodingSave
Generation IV Pokémon species (Pokémon evolutionLearnsets)
PokémonSaveCharacter encoding
Generation V–present Character encoding
Generation VIII Save
TCG GB and GB2 Character encoding


Project Games logo.png This data structure article is part of Project Games, a Bulbapedia project that aims to write comprehensive articles on the Pokémon games.