นี่คือหน้าเอกสารการใช้งานสำหรับ มอดูล:data consistency check

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Output แก้ไข

Discrepancies detected:

Module:etymology languages/code to canonical name

  • goh-lng, the code for the canonical name Lombardic, is wrong; it should be lng.

Module:etymology languages/data

  • The data key alias_codes for ??? (lng) is invalid.
  • ภาษามคธ (pra-mag) has a canonical name that is not unique; it is also used by the code mag.
  • The data key preprocess_links for ??? (th-new) is invalid.

Module:families/canonical names

  • The code ira-mid and the canonical name อิเรเนียนกลาง should be removed; they are not found in Module:families/data.
  • The code ira-old and the canonical name อิเรเนียนเก่า should be removed; they are not found in Module:families/data.

Module:families/code to canonical name

  • The code ira-mid and the canonical name อิเรเนียนกลาง should be removed; they are not found in Module:families/data.
  • The code ira-old and the canonical name อิเรเนียนเก่า should be removed; they are not found in Module:families/data.

Module:families/data

Module:languages/data/2

Module:languages/data/3/h

Module:languages/data/3/s

Module:scripts/by name

  • Chisoi (Chis) is missing
  • ดิเวส อกุรุ (Diak) is missing
  • Dhives Akuru, the canonical name for the code Diak, is wrong; it should be ดิเวส อกุรุ.
  • Garay (Gara) is missing
  • Khema (Gukh) is missing
  • Pahawh Hmong (Hmng) is missing
  • ม้ง, the canonical name for the code Hmng, is wrong; it should be Pahawh Hmong.
  • IPAchar, the code for the canonical name สัทอักษรสากล, is wrong; it should be Ipach.
  • กวิ (Kawi) is missing
  • Kawi, the canonical name for the code Kawi, is wrong; it should be กวิ.
  • Kirat Rai (Krai) is missing
  • มโร (Mroo) is missing
  • Mro, the canonical name for the code Mroo, is wrong; it should be มโร.
  • Ol Onal (Onao) is missing
  • อุสมาน (Osma) is missing
  • Osmanya, the canonical name for the code Osma, is wrong; it should be อุสมาน.
  • Sidetic (Sidt) is missing
  • Khudawadi, the canonical name for the code Sind, is wrong; it should be คุดาบาด.
  • คุดาบาด (Sind) is missing
  • Sunuwar (Sunu) is missing
  • Lai Tay (Tayo) is missing
  • Todhri (Todr) is missing
  • Tolong Siki (Tols) is missing
  • Tigalari (Tutg) is missing
  • วรังจิติ (Wara) is missing
  • Varang Kshiti, the canonical name for the code Wara, is wrong; it should be วรังจิติ.
  • Tamyig, the canonical name for the code sit-tam-Tibt, is wrong; it should be ตัมยิก.
  • The code xzh-Tibt and the canonical name Zhang-Zhung should be removed; they are not found in a submodule of Module:scripts.

Module:scripts/code to canonical name

  • Chis (Chisoi) is missing
  • Dhives Akuru, the canonical name for the code Diak, is wrong; it should be ดิเวส อกุรุ.
  • Gara (Garay) is missing
  • Gukh (Khema) is missing
  • ม้ง, the canonical name for the code Hmng, is wrong; it should be Pahawh Hmong.
  • IPAchar, the code for the canonical name สัทอักษรสากล, is wrong; it should be Ipach.
  • Kawi, the canonical name for the code Kawi, is wrong; it should be กวิ.
  • Krai (Kirat Rai) is missing
  • Latnx, the code for the canonical name ละติน, is wrong; it should be Latn.
  • Mro, the canonical name for the code Mroo, is wrong; it should be มโร.
  • Onao (Ol Onal) is missing
  • Osmanya, the canonical name for the code Osma, is wrong; it should be อุสมาน.
  • Sidt (Sidetic) is missing
  • Khudawadi, the canonical name for the code Sind, is wrong; it should be คุดาบาด.
  • Sunu (Sunuwar) is missing
  • Tayo (Lai Tay) is missing
  • Todr (Todhri) is missing
  • Tols (Tolong Siki) is missing
  • Tutg (Tigalari) is missing
  • Varang Kshiti, the canonical name for the code Wara, is wrong; it should be วรังจิติ.
  • Tamyig, the canonical name for the code sit-tam-Tibt, is wrong; it should be ตัมยิก.
  • xka-Arab, the code for the canonical name อาหรับ, is wrong; it should be fa-Arab.
  • The code xzh-Tibt and the canonical name Zhang-Zhung should be removed; they are not found in a submodule of Module:scripts.

Module:scripts/data

Checks performed แก้ไข

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.
  • Wikidata item IDs must be a positive integer or a string starting with Q and ending with decimal digits.

The following must be true of the data used by Module:languages:

  • Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
  • The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
  • If field 2 is not nil, it must a valid Wikidata item ID.
  • If field 3 or family is given and not nil, it must be a valid family code.
  • If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
  • If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standardChars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
  • If link_tr is present, it must be true.
  • Have no data keys besides these: 1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr".

Checks not performed:

  • If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
  • If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:

  • canonicalName must be given.
  • parent must be given must be a valid language, family or etymology-only language code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".