มอดูล:data consistency check
- The following documentation is located at มอดูล:data consistency check/documentation. [edit]
- Useful links: subpage list • links • transclusions • testcases • sandbox
This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.
Output
แก้ไขDiscrepancies detected:
- The data key
preprocess_links
for Hacked Thai (th-new
) is invalid.
- The code
ira-mid
and the canonical name อิเรเนียนกลาง should be removed; they are not found in Module:families/data. - The code
ira-old
and the canonical name อิเรเนียนเก่า should be removed; they are not found in Module:families/data.
- The code
ira-mid
and the canonical name อิเรเนียนกลาง should be removed; they are not found in Module:families/data. - The code
ira-old
and the canonical name อิเรเนียนเก่า should be removed; they are not found in Module:families/data.
- กลุ่มภาษาHuasteca Nahuatl (
azc-hua
) has no child families or languages. - กลุ่มภาษาอินโด-อารยันเก่า (
inc-old
) has no child families or languages. - กลุ่มภาษาMey-Sartang (
sit-khm
) has no child families or languages.
- อินุกติตุต (
iu
) has theoverride_translit
field set, but no transliteration module - นอร์เวย์แบบบุ๊กมอล (
nb
) has เดนมาร์ก (da
) set as an ancestor, but is not in the กลุ่มภาษาสแกนดิเนเวียนตะวันออก (gmq-eas
). - นอร์เวย์แบบบุ๊กมอล (
nb
) has นอร์เวย์กลาง (gmq-mno
) set as an ancestor, but is not in the กลุ่มภาษาสแกนดิเนเวียนตะวันตก (gmq-wes
). - ออสซีเซีย (
os
) is in the กลุ่มภาษาScythian (xsc
) and has ออสซีเชียเก่า (oos
) set as an ancestor, but it is not possible to form an ancestral chain between them.
- ฮินดูสตานีแบบแคริบเบียน (
hns
) has โภชปุระ (bho
) set as an ancestor, but is not in the กลุ่มภาษาBihari (inc-bih
). - ฮินดูสตานีแบบแคริบเบียน (
hns
) has อวัธ (awa
) set as an ancestor, but is not in the กลุ่มภาษาฮินดีตะวันออก (inc-hie
).
- ออสซีเชียเก่า (
oos
) is in the กลุ่มภาษาScythian (xsc
) and has Proto-Ossetic (os-pro
) set as an ancestor, but it is not possible to form an ancestral chain between them.
- Jassic (
ysc
) is in the กลุ่มภาษาScythian (xsc
) and has ออสซีเชียเก่า (oos
) set as an ancestor, but it is not possible to form an ancestral chain between them.
- Proto-Bua (
alv-bua-pro
) does not have the expected name "Buaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาBua (alv-bua
). - Proto-Cangin (
alv-cng-pro
) does not have the expected name "Canginดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCangin (alv-cng
). - Proto-Edekiri (
alv-edk-pro
) does not have the expected name "Edekiriดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาEdekiri (alv-edk
). - Proto-Edoid (
alv-edo-pro
) does not have the expected name "Edoidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาEdoid (alv-edo
). - Proto-Fali (
alv-fli-pro
) does not have the expected name "Faliดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาFali (alv-fli
). - Proto-Guang (
alv-gng-pro
) does not have the expected name "Guangดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาGuang (alv-gng
). - Proto-Central Togo (
alv-gtm-pro
) does not have the expected name "Ghana-Togo Mountainดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาGhana-Togo Mountain (alv-gtm
). - Proto-Heiban (
alv-hei-pro
) does not have the expected name "Heibanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาHeiban (alv-hei
). - Proto-Idomoid (
alv-ido-pro
) does not have the expected name "Idomoidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาIdomoid (alv-ido
). - Proto-Igboid (
alv-igb-pro
) does not have the expected name "Igboidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาIgboid (alv-igb
). - Proto-Kwa (
alv-kwa-pro
) does not have the expected name "Kwaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKwa (alv-kwa
). - Proto-Mumuye (
alv-mum-pro
) does not have the expected name "Mumuyeดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMumuye (alv-mum
). - Proto-Nupoid (
alv-nup-pro
) does not have the expected name "Nupoidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNupoid (alv-nup
). - Proto-Yoruba (
alv-yor-pro
) does not have the expected name "Yorubaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาYoruba (alv-yor
). - Proto-Yoruboid (
alv-yrd-pro
) does not have the expected name "Yoruboidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาYoruboid (alv-yrd
). - Proto-Apachean (
apa-pro
) does not have the expected name "Apacheanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาApachean (apa
). - Proto-Athabaskan (
ath-pro
) does not have the expected name "Athabaskanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาAthabaskan (ath
). - Proto-Arawa (
auf-pro
) does not have the expected name "Arauanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาArauan (auf
). - Proto-Arnhem (
aus-arn-pro
) does not have the expected name "Arnhemดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาArnhem (aus-arn
). - Proto-Central New South Wales (
aus-cww-pro
) does not have the expected name "Central New South Walesดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCentral New South Wales (aus-cww
). - Proto-Daly (
aus-dal-pro
) does not have the expected name "Dalyดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาDaly (aus-dal
). - Proto-Nyulnyulan (
aus-nyu-pro
) does not have the expected name "Nyulnyulanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNyulnyulan (aus-nyu
). - Proto-Pama-Nyungan (
aus-pam-pro
) does not have the expected name "Pama-Nyunganดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาPama-Nyungan (aus-pam
). - Proto-Iwaidjan (
aus-wdj-pro
) does not have the expected name "Iwaidjanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาIwaidjan (aus-wdj
). - Proto-Amuesha-Chamicuro (
awd-amc-pro
) has a proto-language code associated with the invalid code"awd-amc"
. - Proto-Kampa (
awd-kmp-pro
) has a proto-language code associated with the invalid code"awd-kmp"
. - Proto-Nawiki (
awd-nwk-pro
) does not have the expected name "Nawikiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNawiki (awd-nwk
). - Proto-Arawak (
awd-pro
) does not have the expected name "Arawakanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาArawakan (awd
). - Proto-Paresi-Waura (
awd-prw-pro
) has a proto-language code associated with the invalid code"awd-prw"
. - Proto-Ta-Arawak (
awd-taa-pro
) does not have the expected name "Ta-Arawakanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTa-Arawakan (awd-taa
). - Proto-Cupan (
azc-cup-pro
) does not have the expected name "Cupanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCupan (azc-cup
). - Proto-Numic (
azc-num-pro
) does not have the expected name "Numicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNumic (azc-num
). - Proto-Takic (
azc-tak-pro
) does not have the expected name "Takicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTakic (azc-tak
). - Proto-Sotho-Tswana (
bnt-sts-pro
) does not have the expected name "Sotho-Tswanaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSotho-Tswana (bnt-sts
). - Proto-Batak (
btk-pro
) does not have the expected name "Batakดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาBatak (btk
). - Proto-Abkhaz-Abaza (
cau-abz-pro
) does not have the expected name "Abkhaz-Abazaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาAbkhaz-Abaza (cau-abz
). - Proto-Andian (
cau-and-pro
) does not have the expected name "Andianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาAndian (cau-and
). - Proto-Avaro-Andian (
cau-ava-pro
) does not have the expected name "Avaro-Andianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาAvaro-Andian (cau-ava
). - Proto-Circassian (
cau-cir-pro
) does not have the expected name "Circassianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCircassian (cau-cir
). - Proto-Dargwa (
cau-drg-pro
) does not have the expected name "Dargwaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาDargwa (cau-drg
). - Proto-Lezghian (
cau-lzg-pro
) does not have the expected name "Lezghianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาLezghian (cau-lzg
). - Proto-Tsezian (
cau-tsz-pro
) does not have the expected name "Tsezianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTsezian (cau-tsz
). - Proto-Chibchan (
cba-pro
) does not have the expected name "Chibchanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาChibchan (cba
). - Proto-Masa (
cdc-mas-pro
) does not have the expected name "Masaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMasa (cdc-mas
). - Proto-Caddoan (
cdd-pro
) does not have the expected name "Caddoanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCaddoan (cdd
). - Proto-Chimakuan (
chi-pro
) does not have the expected name "Chimakuanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาChimakuan (chi
). - Proto-Mari (
chm-pro
) does not have the expected name "Mariดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMari (chm
). - Proto-Bongo-Bagirmi (
csu-bba-pro
) does not have the expected name "Bongo-Bagirmiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาBongo-Bagirmi (csu-bba
). - Proto-Mangbetu (
csu-maa-pro
) does not have the expected name "Mangbetuดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMangbetu (csu-maa
). - Proto-Central Sudanic (
csu-pro
) does not have the expected name "Central Sudanicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCentral Sudanic (csu
). - Proto-Sara (
csu-sar-pro
) does not have the expected name "Saraดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSara (csu-sar
). - Proto-Highland East Cushitic (
cus-hec-pro
) does not have the expected name "Highland East Cushiticดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาHighland East Cushitic (cus-hec
). - Proto-Cushitic (
cus-pro
) does not have the expected name "Cushiticดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCushitic (cus
). - Proto-South Cushitic (
cus-sou-pro
) does not have the expected name "South Cushiticดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSouth Cushitic (cus-sou
). - Proto-Western Mande (
dmn-mdw-pro
) does not have the expected name "Western Mandeดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาWestern Mande (dmn-mdw
). - Proto-Mande (
dmn-pro
) does not have the expected name "Mandeดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMande (dmn
). - Proto-Rukai (
dru-pro
) has a proto-language code associated with Rukai (dru
), which is not a family. - Proto-Eskimo-Aleut (
esx-pro
) does not have the expected name "Eskimo-Aleutดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาEskimo-Aleut (esx
). - บาสก์ดั้งเดิม (
euq-pro
) does not have the expected name "วาสโกนิกดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาวาสโกนิก (euq
). - Proto-Gbaya (
gba-pro
) does not have the expected name "Gbayaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาGbaya (gba
). - นอร์สดั้งเดิม (
gmq-pro
) does not have the expected name "เจอร์แมนิกเหนือดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาเจอร์แมนิกเหนือ (gmq
). - Proto-Nuristani (
iir-nur-pro
) does not have the expected name "Nuristaniดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNuristani (iir-nur
). - Proto-Ijoid (
ijo-pro
) does not have the expected name "Ijoidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาIjoid (ijo
). - Proto-Kamta (
inc-krn-pro
) does not have the expected name "KRNB lectsดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKRNB lects (inc-krn
). - Proto-Komisenian (
ira-kms-pro
) does not have the expected name "Komisenianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKomisenian (ira-kms
). - Proto-Munji-Yidgha (
ira-mny-pro
) does not have the expected name "Munji-Yidghaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMunji-Yidgha (ira-mny
). - Proto-Medo-Parthian (
ira-mpr-pro
) does not have the expected name "Medo-Parthianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMedo-Parthian (ira-mpr
). - Proto-Sanglechi-Ishkashimi (
ira-sgi-pro
) does not have the expected name "Sanglechi-Ishkashimiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSanglechi-Ishkashimi (ira-sgi
). - Proto-Shughni-Roshani (
ira-shr-pro
) does not have the expected name "Shughni-Roshaniดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาShughni-Roshani (ira-shr
). - Proto-Shughni-Yazghulami (
ira-shy-pro
) does not have the expected name "Shughni-Yazghulamiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาShughni-Yazghulami (ira-shy
). - Proto-Shughni-Yazghulami-Munji (
ira-sym-pro
) does not have the expected name "Shughni-Yazghulami-Munjiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาShughni-Yazghulami-Munji (ira-sym
). - Proto-Zaza-Gorani (
ira-zgr-pro
) does not have the expected name "Zaza-Goraniดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาZaza-Gorani (ira-zgr
). - Proto-North Iroquoian (
iro-nor-pro
) does not have the expected name "North Iroquoianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNorth Iroquoian (iro-nor
). - Proto-Iroquoian (
iro-pro
) does not have the expected name "Iroquoianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาIroquoian (iro
). - Proto-Ryukyuan (
jpx-ryu-pro
) does not have the expected name "รีวกีวอันดั้งเดิม", even though it is the proto-language of the กลุ่มภาษารีวกีวอัน (jpx-ryu
). - Proto-Khanty (
kca-pro
) does not have the expected name "Khantyดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKhanty (kca
). - Proto-Khoe (
khi-kho-pro
) does not have the expected name "Khoeดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKhoe (khi-kho
). - Proto-Kru (
kro-pro
) does not have the expected name "Kruดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKru (kro
). - Proto-Atayalic (
map-ata-pro
) does not have the expected name "Atayalicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาAtayalic (map-ata
). - Kelantan Peranakan Hokkien (
mis-hkl
) has its canonical name ("Kelantan Peranakan Hokkien"
) repeated in the table ofaliases
. - Proto-Aslian (
mkh-asl-pro
) does not have the expected name "Aslianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาAslian (mkh-asl
). - Proto-Bahnaric (
mkh-ban-pro
) does not have the expected name "Bahnaricดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาBahnaric (mkh-ban
). - Proto-Katuic (
mkh-kat-pro
) does not have the expected name "Katuicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKatuic (mkh-kat
). - เขมรดั้งเดิม (
mkh-kmr-pro
) does not have the expected name "ขแมริกดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาขแมริก (mkh-kmr
). - มอญดั้งเดิม (
mkh-mnc-pro
) does not have the expected name "โมนิกดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาโมนิก (mkh-mnc
). - ปะหล่องดั้งเดิม (
mkh-pal-pro
) does not have the expected name "Palaungicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาPalaungic (mkh-pal
). - Proto-Pearic (
mkh-pea-pro
) does not have the expected name "Pearicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาPearic (mkh-pea
). - Proto-Pakanic (
mkh-pkn-pro
) does not have the expected name "Pakanicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาPakanic (mkh-pkn
). - Proto-Mansi (
mns-pro
) does not have the expected name "Mansiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMansi (mns
). - Proto-Chumash (
nai-chu-pro
) does not have the expected name "Chumashanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาChumashan (nai-chu
). - Proto-Chinookan (
nai-ckn-pro
) does not have the expected name "Chinookanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาChinookan (nai-ckn
). - Proto-Kalapuyan (
nai-klp-pro
) does not have the expected name "Kalapuyanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKalapuyan (nai-klp
). - Proto-Maidun (
nai-mdu-pro
) does not have the expected name "Maiduanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMaiduan (nai-mdu
). - Proto-Mixe-Zoque (
nai-miz-pro
) does not have the expected name "Mixe-Zoqueanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMixe-Zoquean (nai-miz
). - Proto-Muskogean (
nai-mus-pro
) does not have the expected name "Muskogeanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMuskogean (nai-mus
). - Proto-Plateau Penutian (
nai-plp-pro
) does not have the expected name "Plateau Penutianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาPlateau Penutian (nai-plp
). - Proto-Pomo (
nai-pom-pro
) does not have the expected name "Pomoanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาPomoan (nai-pom
). - Proto-Siouan-Catawban (
nai-sca-pro
) does not have the expected name "Siouan-Catawbanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSiouan-Catawban (nai-sca
). - Proto-Totozoquean (
nai-tot-pro
) does not have the expected name "Totozoqueanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTotozoquean (nai-tot
). - Proto-Tsimshianic (
nai-tsi-pro
) does not have the expected name "Tsimshianicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTsimshianic (nai-tsi
). - Proto-Utian (
nai-utn-pro
) does not have the expected name "Utianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาUtian (nai-utn
). - Proto-Trans-New Guinea (
ngf-pro
) does not have the expected name "Trans-New Guineaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTrans-New Guinea (ngf
). - Proto-Eastern Oti-Volta (
nic-eov-pro
) does not have the expected name "Eastern Oti-Voltaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาEastern Oti-Volta (nic-eov
). - Proto-Gurunsi (
nic-gns-pro
) does not have the expected name "Gurunsiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาGurunsi (nic-gns
). - Proto-Grassfields (
nic-grf-pro
) does not have the expected name "Grassfieldsดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาGrassfields (nic-grf
). - Proto-Jukunoid (
nic-jkn-pro
) does not have the expected name "Jukunoidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาJukunoid (nic-jkn
). - Proto-Lower Cross River (
nic-lcr-pro
) does not have the expected name "Lower Cross Riverดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาLower Cross River (nic-lcr
). - Proto-Ogoni (
nic-ogo-pro
) does not have the expected name "Ogoniดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาOgoni (nic-ogo
). - Proto-Oti-Volta (
nic-ovo-pro
) does not have the expected name "Oti-Voltaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาOti-Volta (nic-ovo
). - Proto-Plateau (
nic-plt-pro
) does not have the expected name "Plateauดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาPlateau (nic-plt
). - Proto-Ubangian (
nic-ubg-pro
) does not have the expected name "Ubangianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาUbangian (nic-ubg
). - Proto-Upper Cross River (
nic-ucr-pro
) does not have the expected name "Upper Cross Riverดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาUpper Cross River (nic-ucr
). - Proto-Chatino (
omq-cha-pro
) does not have the expected name "Chatinoดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาChatino (omq-cha
). - Proto-Mazatec (
omq-maz-pro
) does not have the expected name "Mazatecanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMazatecan (omq-maz
). - Proto-Mixtecan (
omq-mix-pro
) does not have the expected name "Mixtecanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMixtecan (omq-mix
). - Proto-Mixtec (
omq-mxt-pro
) does not have the expected name "Mixtecดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMixtec (omq-mxt
). - Proto-Oto-Pamean (
omq-otp-pro
) does not have the expected name "Oto-Pameanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาOto-Pamean (omq-otp
). - Proto-Oto-Manguean (
omq-pro
) does not have the expected name "Oto-Mangueanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาOto-Manguean (omq
). - Proto-Trique (
omq-tri-pro
) does not have the expected name "Triqueดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTrique (omq-tri
). - Proto-Zapotecan (
omq-zap-pro
) does not have the expected name "Zapotecanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาZapotecan (omq-zap
). - Proto-Zapotec (
omq-zpc-pro
) does not have the expected name "Zapotecดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาZapotec (omq-zpc
). - Proto-Aroid (
omv-aro-pro
) does not have the expected name "Aroidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาAroid (omv-aro
). - Proto-Dizoid (
omv-diz-pro
) does not have the expected name "Dizoidดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาDizoid (omv-diz
). - Proto-Omotic (
omv-pro
) does not have the expected name "Omoticดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาOmotic (omv
). - Proto-Ossetic (
os-pro
) has a proto-language code associated with ออสซีเซีย (os
), which is not a family. - Proto-Ossetic (
os-pro
) lists the invalid language code"xln"
as its ancestor. - Proto-Otomi (
oto-otm-pro
) does not have the expected name "Otomiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาOtomi (oto-otm
). - Proto-Otomian (
oto-pro
) does not have the expected name "Otomianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาOtomian (oto
). - Proto-North Halmahera (
paa-nha-pro
) does not have the expected name "North Halmaheraดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNorth Halmahera (paa-nha
). - Proto-Bungku-Tolaki (
poz-btk-pro
) does not have the expected name "Bungku-Tolakiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาBungku-Tolaki (poz-btk
). - Proto-Halmahera-Cenderawasih (
poz-hce-pro
) does not have the expected name "Halmahera-Cenderawasihดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาHalmahera-Cenderawasih (poz-hce
). - Proto-Chukotko-Kamchatkan (
qfa-cka-pro
) does not have the expected name "Chukotko-Kamchatkanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาChukotko-Kamchatkan (qfa-cka
). - Proto-Hurro-Urartian (
qfa-hur-pro
) does not have the expected name "Hurro-Urartianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาHurro-Urartian (qfa-hur
). - Proto-Kadu (
qfa-kad-pro
) does not have the expected name "Kaduดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKadu (qfa-kad
). - Proto-Kam-Sui (
qfa-kms-pro
) does not have the expected name "Kam-Suiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKam-Sui (qfa-kms
). - เกาหลีดั้งเดิม (
qfa-kor-pro
) does not have the expected name "โคเรียนดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาโคเรียน (qfa-kor
). - ไหลดั้งเดิม (
qfa-lic-pro
) does not have the expected name "ไลอิกดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาไลอิก (qfa-lic
). - Proto-Ongan (
qfa-ong-pro
) does not have the expected name "Onganดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาOngan (qfa-ong
). - Proto-Yeniseian (
qfa-yen-pro
) does not have the expected name "Yeniseianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาYeniseian (qfa-yen
). - Proto-Yukaghir (
qfa-yuk-pro
) does not have the expected name "Yukaghirดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาYukaghir (qfa-yuk
). - Proto-Boran (
sai-bor-pro
) does not have the expected name "Boranดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาBoran (sai-bor
). - Proto-Cariban (
sai-car-pro
) does not have the expected name "Caribanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCariban (sai-car
). - Proto-Cerrado (
sai-cer-pro
) does not have the expected name "Cerradoดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCerrado (sai-cer
). - Proto-Central Jê (
sai-cje-pro
) does not have the expected name "Central Jêดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาCentral Jê (sai-cje
). - Proto-Jê (
sai-jee-pro
) does not have the expected name "Jêดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาJê (sai-jee
). - Proto-Northern Jê (
sai-nje-pro
) does not have the expected name "Northern Jêดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNorthern Jê (sai-nje
). - Proto-Southern Jê (
sai-sje-pro
) does not have the expected name "Southern Jêดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSouthern Jê (sai-sje
). - Proto-Taranoan (
sai-tar-pro
) does not have the expected name "Taranoanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTaranoan (sai-tar
). - Proto-Witotoan (
sai-wit-pro
) does not have the expected name "Witotoanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาWitotoan (sai-wit
). - Proto-Salish (
sal-pro
) does not have the expected name "Salishanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSalishan (sal
). - Proto-Daju (
sdv-daj-pro
) does not have the expected name "Dajuดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาDaju (sdv-daj
). - Proto-Eastern Jebel (
sdv-eje-pro
) does not have the expected name "Eastern Jebelดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาEastern Jebel (sdv-eje
). - Proto-Nilotic (
sdv-nil-pro
) does not have the expected name "Niloticดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNilotic (sdv-nil
). - Proto-Nyima (
sdv-nyi-pro
) does not have the expected name "Nyimaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNyima (sdv-nyi
). - Proto-Taman (
sdv-tmn-pro
) does not have the expected name "Tamanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTaman (sdv-tmn
). - Proto-Selkup (
sel-pro
) does not have the expected name "Selkupดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSelkup (sel
). - Proto-Siouan (
sio-pro
) does not have the expected name "Siouanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSiouan (sio
). - Proto-Bai (
sit-bai-pro
) does not have the expected name "ไป๋ดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาไป๋ (sit-bai
). - Proto-Hrusish (
sit-hrs-pro
) does not have the expected name "Hrusishดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาHrusish (sit-hrs
). - Proto-Kham (
sit-kha-pro
) does not have the expected name "Khamดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKham (sit-kha
). - Proto-Kho-Bwa (
sit-khb-pro
) does not have the expected name "Kho-Bwaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKho-Bwa (sit-khb
). - Proto-Puroik (
sit-khp-pro
) has a proto-language code associated with the invalid code"sit-khp"
. - Proto-Western Kho-Bwa (
sit-khw-pro
) does not have the expected name "Western Kho-Bwaดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาWestern Kho-Bwa (sit-khw
). - Proto-Tamangic (
sit-tam-pro
) does not have the expected name "Tamangicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTamangic (sit-tam
). - Proto-Tani (
sit-tan-pro
) does not have the expected name "Taniดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTani (sit-tan
). - Proto-Songhay (
son-pro
) does not have the expected name "Songhayดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSonghay (son
). - Proto-Kuliak (
ssa-klk-pro
) does not have the expected name "Kuliakดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKuliak (ssa-klk
). - Proto-Koman (
ssa-kom-pro
) does not have the expected name "Komanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKoman (ssa-kom
). - Proto-Nilo-Saharan (
ssa-pro
) does not have the expected name "Nilo-Saharanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNilo-Saharan (ssa
). - Proto-Samoyedic (
syd-pro
) does not have the expected name "Samoyedicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSamoyedic (syd
). - Proto-Kuki-Chin (
tbq-kuk-pro
) does not have the expected name "Kukishดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาKukish (tbq-kuk
). - Proto-Lalo (
tbq-lal-pro
) does not have the expected name "Laloดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาLalo (tbq-lal
). - Proto-Tungusic (
tuw-pro
) does not have the expected name "Tungusicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTungusic (tuw
). - Proto-Mordvinic (
urj-mdv-pro
) does not have the expected name "Mordvinicดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาMordvinic (urj-mdv
). - Proto-Tatic (
xme-ttc-pro
) does not have the expected name "Taticดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาTatic (xme-ttc
). - Proto-Na-Dene (
xnd-pro
) does not have the expected name "Na-Deneดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาNa-Dene (xnd
). - Proto-Scythian (
xsc-pro
) does not have the expected name "Scythianดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาScythian (xsc
). - Proto-Saka (
xsc-sak-pro
) does not have the expected name "Sakanดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSakan (xsc-sak
). - Proto-Sarmatian (
xsc-sar-pro
) has a proto-language code associated with the invalid code"xsc-sar"
. - Proto-Saka-Wakhi (
xsc-skw-pro
) does not have the expected name "Saka-Wakhiดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาSaka-Wakhi (xsc-skw
). - Proto-Yupik (
ypk-pro
) does not have the expected name "Yupikดั้งเดิม", even though it is the proto-language of the กลุ่มภาษาYupik (ypk
).
apc
is set as an ISO 639-3 code on multiple items:Q56593
และQ22809485
.kjv
is set as an ISO 639-3 code on multiple items:Q838165
และQ31199873
.msn
is set as an ISO 639-3 code on multiple items:Q3331111
และQ3563857
.ttt
is set as an ISO 639-3 code on multiple items:Q56489
และQ123964178
.
- The canonical name อักษรBlissymbolic (
Blis
) is missing. - Blissymbols, the canonical name for the code
Blis
, is wrong; it should be Blissymbolic. - The canonical name อักษรKhwarezmian (
Chrs
) is missing. - Chorasmian, the canonical name for the code
Chrs
, is wrong; it should be Khwarezmian. - Nag Mundari, the canonical name for the code
Nagm
, is wrong; it should be Mundari Bani. - The canonical name อักษรMundari Bani (
Nagm
) is missing. - Old North Arabian, the canonical name for the code
Narb
, is wrong; it should be Ancient North Arabian. - The canonical name อักษรAncient North Arabian (
Narb
) is missing. - The canonical name อักษรOld Turkic (
Orkh
) is missing. - Orkhon runes, the canonical name for the code
Orkh
, is wrong; it should be Old Turkic. - The canonical name อักษรProto-Cuneiform (
Pcun
) is missing. - The canonical name อักษรProto-Elamite (
Pelm
) is missing. - The canonical name อักษรProto-Sinaitic (
Psin
) is missing. - The canonical name อักษรAncient South Arabian (
Sarb
) is missing. - Old South Arabian, the canonical name for the code
Sarb
, is wrong; it should be Ancient South Arabian.
- Blissymbols, the canonical name for the code
Blis
, is wrong; it should be Blissymbolic. - Chorasmian, the canonical name for the code
Chrs
, is wrong; it should be Khwarezmian. - Nag Mundari, the canonical name for the code
Nagm
, is wrong; it should be Mundari Bani. - Old North Arabian, the canonical name for the code
Narb
, is wrong; it should be Ancient North Arabian. - Orkhon runes, the canonical name for the code
Orkh
, is wrong; it should be Old Turkic. - The code
Pcun
(อักษรProto-Cuneiform) is missing - The code
Pelm
(อักษรProto-Elamite) is missing - The code
Psin
(อักษรProto-Sinaitic) is missing - Old South Arabian, the canonical name for the code
Sarb
, is wrong; it should be Ancient South Arabian.
- อักษรBlissymbolic (
Blis
) is not used by any language and has no characters listed for auto-detection. - อักษรCypro-Minoan (
Cpmn
) is not used by any language. - อักษรฮิรางานะ (
Hira
) is not used by any language. - อักษรคานะ (
Hrkt
) is not used by any language. - อักษรImage-rendered (
Image
) is not used by any language and has no characters listed for auto-detection. - อักษรสัทอักษรสากล (
Ipach
) is not used by any language and has no characters listed for auto-detection. - อักษรMoon (
Moon
) is not used by any language and has no characters listed for auto-detection. - รหัสมอร์ส (
Morse
) is not used by any language and has no characters listed for auto-detection. - สัญกรณ์ดนตรี (
Music
) is not used by any language. - อักษรไม่ระบุ (
None
) is not used by any language and has no characters listed for auto-detection. - อักษรProto-Cuneiform (
Pcun
) is not used by any language and has no characters listed for auto-detection. - อักษรProto-Elamite (
Pelm
) is not used by any language and has no characters listed for auto-detection. - อักษรProto-Sinaitic (
Psin
) is not used by any language and has no characters listed for auto-detection. - อักษรRongorongo (
Roro
) is not used by any language and has no characters listed for auto-detection. - อักษรRumi numerals (
Rumin
) is not used by any language. - สัญญาณธง (
Semap
) is not used by any language and has no characters listed for auto-detection. - อักษรVisible Speech (
Visp
) is not used by any language and has no characters listed for auto-detection. - อักษรmathematical notation (
Zmth
) is not used by any language. - อักษรสัญลักษณ์ (
Zsym
) is not used by any language. - อักษรยังไม่กำหนด (
Zyyy
) is not used by any language and has no characters listed for auto-detection. - อักษรยังไม่มีรหัส (
Zzzz
) is not used by any language and has no characters listed for auto-detection. - The codes
fa-Arab
,ug-Arab
,ks-Arab
,ps-Arab
,ur-Arab
,tt-Arab
,ku-Arab
,ota-Arab
,mzn-Arab
andsd-Arab
are currently alias codes. Only one code should be used in the data. - The codes
ms-Arab
andkk-Arab
are currently alias codes. Only one code should be used in the data.
Checks performed
แก้ไขFor multiple data modules:
- Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
- Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
- Each name in the list of other names must appear only once.
otherNames
, if present, must be an array.- Wikidata item IDs must be a positive integer or a string starting with
Q
and ending with decimal digits.
The following must be true of the data used by Module:languages:
- Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
- The canonical name (field
1
) must be present and must not be the same as the canonical name of another language. - If field
2
is notnil
, it must a valid Wikidata item ID. - If field
3
orfamily
is given and notnil
, it must be a valid family code. - If field
4
orscripts
is given and notnil
, it must be an array, and each string in the array must be a valid script code. - If
ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. - If
family
is given, it must be a valid family code. - If
type
is given, it must be one of the recognised values (regular
,reconstructed
,appendix-constructed
). - If
entry_name
is given, it must be a table that contains either two arrays (from
andto
) or a string (remove_diacritics
) or both. - If
sort_key
is given, it may either be a string, or at table that in turn contains either two arrays (from
andto
) or a string (remove_diacritics
). - If
entry_name
orsort_key
is given, thefrom
array must be longer or equal in length to theto
array. - If
standardChars
is given, it must form a valid Lua string pattern when placed between square brackets with^
before it ("[^...]
). (It should match all characters regularly used in the language, but that cannot be tested.) - If
override_translit
is set,translit
must also be set, because there must be a transliteration module that can override manual transliteration. - If
link_tr
is present, it must betrue
. - Have no data keys besides these:
1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr"
.
Checks not performed:
- If
translit
is present, it should be the name of a module, and this module should contain atr
function that takes a pagename (and optionally a language code and script code) as arguments. - If
sort_key
is a string, it should be the name of a module, and this module should contain amakeSortKey
function that takes a pagename (and optionally a language code and script code) as arguments. - If
entry_name
orsort_key
is a table and contains a fieldremove_diacritics
, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]
).
These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link
attempts to use the transliteration module.
Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.
The following must be true of the data used by Module:etymology languages:
canonicalName
must be given.parent
must be given must be a valid language, family or etymology-only language code.- If
ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language. - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item"
.
Codes in Module:families data must:
- Have
canonicalName
, which must not be the same as the canonical name of another family. - If
family
is given, it must be a valid family code. - Have at least one language or subfamily belonging to it.
- Have no data keys besides these:
"canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item"
.
Codes in Module:scripts data must:
- Have
canonicalName
. - Have at least one language that lists it as one of its scripts.
- Have a
characters
pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"
). (It should match all characters in the script, but that cannot be tested.) - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction"
.
-- TODO:
-- ietf_subtag field used with a 2/3-letter langauge/family code except qaa-qtz, or a 4-letter script code.
-- Check against files containing up-to-date ISO data, to cross-check validity.
local export = {}
local mw = mw
local require = require
local string = string
local m_languages = require("Module:languages")
local m_languages_data_all = require("Module:languages/data/all")
local m_languages_codes = require("Module:languages/code to canonical name")
local m_languages_canonical_names = require("Module:languages/canonical names")
local m_etym_languages_data = require("Module:etymology languages/data")
local m_etym_languages_codes = require("Module:etymology languages/code to canonical name")
local m_etym_languages_canonical_names = require("Module:etymology languages/canonical names")
local m_families = require("Module:families")
local m_families_data = require("Module:families/data")
local m_families_codes = require("Module:families/code to canonical name")
local m_families_canonical_names = require("Module:families/canonical names")
local m_load = require("Module:load")
local m_scripts = require("Module:scripts")
local m_scripts_data = require("Module:scripts/data")
local m_scripts_codes = require("Module:scripts/code to canonical name")
local m_scripts_canonical_names = require("Module:scripts/by name")
local m_str_utils = require("Module:string utilities")
local m_table = require("Module:table")
local Array = require("Module:array")
local add_indefinite_article = m_str_utils.add_indefinite_article
local codepoint = m_str_utils.codepoint
local concat = table.concat
local dump = mw.dumpObject
local format = string.format
local gcodepoint = m_str_utils.gcodepoint
local get_data_module_name = m_languages.getDataModuleName
local get_family_by_code = m_families.getByCode
local get_family_by_canonical_name = m_families.getByCanonicalName
local get_indefinite_article = m_str_utils.get_indefinite_article
local get_language_by_code = m_languages.getByCode
local get_language_by_canonical_name = m_languages.getByCanonicalName
local get_script_by_code = m_scripts.getByCode
local get_script_by_canonical_name = m_scripts.getByCanonicalName
local gmatch = string.gmatch
local gsub = string.gsub
local insert = table.insert
local ipairs = ipairs
local is_positive_integer = require("Module:math").is_positive_integer
local isutf8 = mw.ustring.isutf8
local json_decode = mw.text.jsonDecode
local language_link = require("Module:links").language_link
local list_to_set = m_table.listToSet
local list_to_text = mw.text.listToText
local load_data = m_load.load_data
local log = mw.log
local make_family = m_families.makeObject
local make_lang = m_languages.makeObject
local make_script = m_scripts.makeObject
local match = string.match
local new_title = mw.title.new
local next = next
local pairs = pairs
local pcall = pcall
local remove_comments = m_str_utils.remove_comments
local safe_require = m_load.safe_require
local sorted_pairs = m_table.sortedPairs
local split = m_str_utils.split
local sub = string.sub
local table_len = m_table.length
local tag_text = require("Module:script utilities").tag_text
local type = type
local umatch = m_str_utils.match
local aliases = require("Module:languages/data").aliases
local messages
local function discrepancy(modname, ...)
local success, result = pcall(function(...)
messages[modname]:insert(format(...))
end, ...)
if not success then
log(result, ...)
end
end
local messages_mt = {}
function messages_mt:__index(k)
local val = Array()
self[k] = val
return val
end
local all_codes = {}
local language_names = {}
local etym_language_names = {}
local family_names = {}
local script_names = {}
local nonempty_families = {}
local allowed_empty_families = {tbq = true}
local nonempty_scripts = {}
local function link(obj, code_first)
return type(obj) == "string" and obj or
code_first and format("<code>%s</code> (%s)", obj:getCode(), obj:makeCategoryLink()) or
format("%s (<code>%s</code>)", obj:makeCategoryLink(), obj:getCode())
end
local function check_data_keys(...)
local valid_keys = Array(...):toSet()
return function (modname, obj, data)
local invalid_keys
for k in pairs(data) do
if not valid_keys[k] then
if not invalid_keys then
invalid_keys = Array(k)
else
invalid_keys:insert(k)
end
end
end
if invalid_keys == nil then
return
end
local plural = #invalid_keys ~= 1
discrepancy(modname,
"The data key%s %s for %s %s invalid.",
plural and "s" or "",
invalid_keys:map(function(key)
return "<code>" .. key .. "</code>"
end):concat(", "),
link(obj),
plural and "are" or "is"
)
end
end
-- Modification of isArray in [[Module:table]].
-- This assumes all keys are either integers or non-numbers.
-- If there are fractional numbers, the results might be incorrect.
-- For instance, find_gap{"a", "b", [0.5] = true} evaluates to 3, but there
-- isn't a gap at 3 in the sense of there being an integer key greater than 3.
local function find_gap(t, can_contain_non_number_keys)
local i = 0
for k in pairs(t) do
if not (can_contain_non_number_keys and type(k) ~= "number") then
i = i + 1
if t[i] == nil then
return i
end
end
end
end
local function check_true_or_string_or_nil(modname, obj, data, key)
local field = data[key]
if not (field == nil or field == true or type(field) == "string") then
discrepancy(modname,
"%s has %s <code>%s</code> value that is not <code>nil</code>, <code>true</code> or a string: <code>%s</code>",
link(obj), get_indefinite_article(key), key, dump(data[key])
)
end
end
local function check_array(modname, obj, data, array_name, subarray_name, can_contain_non_number_keys)
local subtable = data
if subarray_name then
subtable = assert(data[subarray_name], subarray_name)
end
local array_type = type(subtable[array_name])
if array_type == "table" then
local gap = find_gap(subtable[array_name], can_contain_non_number_keys)
if gap then
discrepancy(modname,
"The %s array in %sthe data table for %s has a gap at index %d.",
array_name,
subarray_name and "the " .. subarray_name .. " field in " or "",
link(obj),
gap
)
else
return true
end
else
discrepancy(modname,
"The %s field in %sthe data table for %s should be an array (table) but is %s.",
array_name,
subarray_name and "the " .. subarray_name .. " field in " or "",
link(obj),
array_type == "nil" and "nil" or "a " .. array_type
)
end
end
local function check_no_alias_codes(modname, mod_data)
local lookup, discrepancies = {}, {}
for k, v in pairs(mod_data) do
local check = lookup[v]
if check then
discrepancies[check] = discrepancies[check] or {"<code>" .. check .. "</code>"}
insert(discrepancies[check], "<code>" .. k .. "</code>")
else
lookup[v] = k
end
end
for _, v in pairs(discrepancies) do
discrepancy(modname,
"The codes %s are currently alias codes. Only one code should be used in the data.",
list_to_text(v, ", ", " and ")
)
end
end
local function check_wikidata_item(modname, obj, data, key)
local data_item = data[key]
if data_item == nil or is_positive_integer(data_item) then
return
end
discrepancy(modname,
"%s has a Wikidata item ID that is not a positive integer: <code>%s</code>",
link(obj), dump(data_item)
)
end
local function check_name_field(modname, obj, data, canonical_name, data_key, allow_nested)
local array = data[data_key]
if not array then
return
end
check_array(modname, obj, data, data_key, nil, true)
local names = {}
local function check_other_name(other_name)
if other_name == canonical_name then
discrepancy(modname,
"%s has its canonical name (<code>%s</code>) repeated in the table of <code>%s</code>.",
link(obj), dump(canonical_name), data_key
)
end
if names[other_name] then
discrepancy(modname,
"The name %s is found twice or more in the list of <code>%s</code> for %s.",
other_name, data_key, link(obj)
)
end
names[other_name] = true
end
for _, other_name in ipairs(array) do
if type(other_name) == "table" then
if not allow_nested then
discrepancy(modname,
"A nested table is found in the list of <code>%s</code> for %s, but isn't allowed.",
data_key, link(obj)
)
else
for _, on in ipairs(other_name) do
check_other_name(on)
end
end
else
check_other_name(other_name)
end
end
end
local function check_other_names_aliases_varieties(modname, obj, data, canonical_name)
if data.otherNames then
check_name_field(modname, obj, data, canonical_name, "otherNames")
end
if data.aliases then
check_name_field(modname, obj, data, canonical_name, "aliases")
end
if data.varieties then
check_name_field(modname, obj, data, canonical_name, "varieties", true)
end
end
local function validate_pattern(pattern, modname, obj, standardChars)
if type(pattern) ~= "string" then
return discrepancy(modname,
"\"%s\", the %spattern for %s, is not a string.",
pattern, standardChars and "standard character " or "", link(obj)
)
elseif not isutf8(pattern) then
return discrepancy(modname,
"%s specifies a pattern for for %scharacter detection which is not valid UTF-8: <code>%s</code>",
link(obj), standardChars and "standard " or "", dump(pattern)
)
end
local ranges
for lower, higher in gmatch(pattern, "(.[\128-\191]*)%-%%?(.[\128-\191]*)") do
if codepoint(lower) >= codepoint(higher) then
ranges = ranges or Array()
insert(ranges, { lower, higher })
end
end
if ranges and ranges[1] then
local plural = #ranges ~= 1 and "s" or ""
discrepancy(modname,
"%s specifies an invalid pattern " ..
"for %scharacter detection: <code>%s</code>. The first codepoint%s " ..
"in the range%s %s %s must be less than or equal to the second.",
link(obj), standardChars and "standard " or "", dump(pattern), plural, plural,
ranges:map(function(range)
return format(range[1] .. "-" .. range[2] .. " (U+%X, U+%X)", codepoint(range[1]), codepoint(range[2]))
end):concat(", "),
#ranges ~= 1 and "are" or "is"
)
end
local success, result = pcall(umatch, "", "[" .. pattern .. "]")
if not success then
discrepancy(modname,
"%s specifies an invalid pattern for %scharacter detection: <code>%s</code> (%s)",
link(obj), standardChars and "standard " or "", dump(pattern), result
)
end
end
local remove_exceptions_addition = 0xF0000
local maximum_code_point = 0x10FFFF
local remove_exceptions_maximum_code_point = maximum_code_point - remove_exceptions_addition
local function check_entry_name_sortkey_display(modname, obj, data, replacements_name)
local replacements = data[replacements_name]
if type(replacements) == "string" then
if not (replacements_name == "sort_key" or replacements_name == "entry_name") then
discrepancy(modname,
"The %s field in the data table for %s must be a table.",
replacements_name, link(obj)
)
end
return
end
if (replacements.from ~= nil) ~= (replacements.to ~= nil) then
discrepancy(modname,
"The <code>from</code> and <code>to</code> arrays in the <code>%s</code> table for %s are not both defined or both undefined.",
replacements_name, link(obj)
)
elseif replacements.from then
for _, key in ipairs {"from", "to"} do
check_array(modname, obj, data, key, replacements_name)
end
end
if replacements.remove_diacritics and type(replacements.remove_diacritics) ~= "string" then
discrepancy(modname,
"The <code>remove_diacritics</code> field in the <code>%s</code> table for %s table must be a string.",
replacements_name, link(obj)
)
end
if replacements.remove_exceptions then
if check_array(modname, obj, data, "remove_exceptions", replacements_name) then
for sequence_i, sequence in ipairs(replacements.remove_exceptions) do
local code_point_i = 0
for code_point in gcodepoint(sequence) do
code_point_i = code_point_i + 1
if code_point > remove_exceptions_maximum_code_point then
discrepancy(modname,
"Code point #%d (0x%04X) in field #%d of the <code>remove_exceptions</code> array for %s is over U+%04X.",
code_point_i, code_point, sequence_i, link(obj), remove_exceptions_maximum_code_point
)
end
end
end
end
end
if replacements.from and replacements.to
and table_len(replacements.to) > table_len(replacements.from) then
discrepancy(modname,
"The <code>from</code> array in the <code>%s</code> table for %s must be shorter or the same length as the <code>to</code> array.",
replacements_name, link(obj)
)
end
end
local function has_ancestor(lang, code)
for _, anc in ipairs(lang:getAncestors()) do
if code == anc:getCode() or has_ancestor(anc, code) then
return true
end
end
end
local function get_default_ancestors(lang)
if lang:hasType("language", "etymology-only") then
local parent = lang:getParent()
if not has_ancestor(parent, lang:getCode()) then
return parent:getAncestorCodes()
end
end
local fam_code, def_anc = lang:getFamilyCode()
while fam_code and fam_code ~= "qfa-not" do
local fam = m_families_data[fam_code]
def_anc = fam.protoLanguage or
m_languages_data_all[fam_code .. "-pro"] and fam_code .. "-pro" or
m_etym_languages_data[fam_code .. "-pro"] and fam_code .. "-pro"
if def_anc and def_anc ~= lang:getCode() then
return {def_anc}
end
fam_code = fam[3]
end
end
local function iterate_ancestor(obj, modname, anc_code)
local anc = get_language_by_code(anc_code, nil, true)
if not anc then
discrepancy(modname,
"%s lists the invalid language code <code>%s</code> as its ancestor.",
link(obj), dump(anc_code)
)
return
end
local anc_fam = anc:getFamily()
if not anc_fam then
discrepancy(modname,
"%s has no family.",
link(anc)
)
return
end
local anc_fam_code = anc_fam:getCode()
local def_ancs = get_default_ancestors(obj)
if def_ancs then
for _, def_anc in ipairs(def_ancs) do
def_anc = get_language_by_code(def_anc, nil, true)
if def_anc and (
anc_code == def_anc:getCode() or
has_ancestor(def_anc, anc_code) or
def_anc:hasParent(anc_code) and not has_ancestor(anc, def_anc:getCode())
) then
discrepancy(modname,
"%s has the ancestor %s listed in its ancestor field, which is redundant, since it is determined to be ancestral automatically.",
link(obj), link(anc)
)
end
end
end
if not obj:inFamily(anc_fam_code) then
discrepancy(modname,
"%s has %s set as an ancestor, but is not in the %s.",
link(obj), link(anc), link(anc_fam)
)
end
local fam, proto = obj
repeat
fam = fam:getFamily()
proto = fam and fam:getProtoLanguage()
until proto or not fam or fam:getCode() == "qfa-not"
if proto and not (
proto:getCode() == anc:getCode() or
proto:hasAncestor(anc:getCode()) or
anc:hasAncestor(proto:getCode())
) then
local fam = obj:getFamily()
discrepancy(modname,
"%s is in the %s and has %s set as an ancestor, but it is not possible to form an ancestral chain between them.",
link(obj), link(fam), link(anc)
)
end
end
local function check_ancestors(modname, obj, data)
local ancestors = data.ancestors
if not ancestors then
return
elseif type(ancestors) == "string" then
ancestors = split(ancestors, "%s*,%s*", true)
end
for _, anc in ipairs(ancestors) do
iterate_ancestor(obj, modname, anc)
end
end
local function check_code_to_name_and_name_to_code_maps(
source_module_type,
source_module_description,
code_to_module_map, name_to_code_map,
code_to_name_modname, code_to_name_module,
name_to_code_modname, name_to_code_module
)
local function check_code_and_name(modname, code, canonical_name)
-- Check the code is in code_to_module_map and that it didn't originate from the wrong data module.
local check_mod = code_to_module_map[code] or code_to_module_map[aliases[code]]
if not (check_mod and match(check_mod, "^" .. source_module_type .. "/data")) then
if not name_to_code_map[canonical_name] then
discrepancy(modname,
"The code <code>%s</code> and the canonical name %s should be removed; they are not found in %s.",
code, canonical_name, source_module_description
)
else
discrepancy(modname,
"<code>%s</code>, the code for the canonical name %s, is wrong; it should be <code>%s</code>.",
code, canonical_name, name_to_code_map[canonical_name]
)
end
elseif not name_to_code_map[canonical_name] then
local data_table = require("Module:" .. code_to_module_map[code])[code]
discrepancy(modname,
"%s, the canonical name for the code <code>%s</code>, is wrong; it should be %s.",
canonical_name, code, data_table[1]
)
end
end
for code, canonical_name in pairs(code_to_name_module) do
check_code_and_name(code_to_name_modname, code, canonical_name)
end
for canonical_name, code in pairs(name_to_code_module) do
check_code_and_name(name_to_code_modname, code, canonical_name)
end
end
local function check_extraneous_extra_data(
data_modname, data_module, extra_data_modname, extra_data_module)
for code, _ in pairs(extra_data_module) do
if not data_module[code] then
discrepancy(extra_data_modname,
"The code <code>%s</code> is not found in [[Module:%s]], and should be removed from [[Module:%s]].",
code, data_modname, extra_data_modname
)
end
end
end
-- TODO: add collision check between the canonical names "X" and "X [Ll]anguage".
local function check_languages(frame)
local check_language_data_keys = check_data_keys(
1, 2, 3, 4, -- canonical name, Wikidata item, family, scripts
"display_text", "generate_forms", "entry_name", "sort_key",
"otherNames", "aliases", "varieties", "ietf_subtag",
"type", "ancestors",
"wikimedia_codes", "wikipedia_article", "standardChars",
"translit", "override_translit", "link_tr",
"dotted_dotless_i"
)
local function check_language(modname, code, data, extra_modname, extra_data)
local obj, code_modname, canonical_name = make_lang(code, data, true), get_data_module_name(code), data[1]
if code_modname ~= modname then
if code_modname == "languages/data/2" then
discrepancy(modname,
"%s is a two-letter code, so should be moved to [[Module:%s]].",
link(obj), code_modname
)
elseif code_modname == "languages/data/exceptional" then
discrepancy(modname,
"%s is an exceptional code, as it does not consist of two or three lowercase letters, so should be moved to [[Module:%s]].",
link(obj), code_modname
)
else
discrepancy(modname,
"%s is a three-letter code beginning with '%s', so should be moved to [[Module:%s]].",
link(obj), sub(code, 1, 1), code_modname
)
end
end
check_language_data_keys(modname, obj, data)
if all_codes[code] then
discrepancy(modname,
"The code <code>%s</code> is not unique; it is also defined in [[Module:%s]].",
code, all_codes[code]
)
else
if not m_languages_codes[code] then
discrepancy("languages/code to canonical name",
"The code %s is missing.",
link(obj, true)
)
end
all_codes[code] = modname
end
if sub(code, -4) == "-pro" then
local fam_code = sub(code, 1, -5)
local fam = get_language_by_code(fam_code, nil, true, true)
if not fam then
discrepancy(modname,
"%s has a proto-language code associated with the invalid code <code>%s</code>.",
link(obj), dump(fam_code)
)
elseif not fam:hasType("family") then
discrepancy(modname,
"%s has a proto-language code associated with %s, which is not a family.",
link(obj), link(fam)
)
else
local expected_name = fam:getCanonicalName() .. "ดั้งเดิม"
if canonical_name ~= expected_name then
discrepancy(modname,
"%s does not have the expected name \"%s\", even though it is the proto-language of the %s.",
link(obj), expected_name, link(fam)
)
end
end
end
if not canonical_name then
discrepancy(modname,
"The code <code>%s</code> has no canonical name specified.",
code
)
elseif language_names[canonical_name] then
local canonical_lang = get_language_by_canonical_name(canonical_name)
if not canonical_lang then
discrepancy(modname,
"%s has a canonical name that cannot be looked up.",
link(obj)
)
elseif data.main_code ~= canonical_lang:getCode() then
discrepancy(modname,
"%s has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(obj), language_names[canonical_name]
)
end
else
if not m_languages_canonical_names[canonical_name] then
discrepancy("languages/canonical names",
"The canonical name %s is missing.",
link(obj)
)
end
language_names[canonical_name] = code
end
check_wikidata_item(modname, obj, data, 2)
if extra_data then
check_other_names_aliases_varieties(modname, obj, extra_data, canonical_name)
end
local lang_type = data.type
if lang_type and not (lang_type == "regular" or lang_type == "reconstructed" or lang_type == "appendix-constructed") then
discrepancy(modname,
"%s is of the invalid type <code>%s</code>.",
link(obj), lang_type
)
end
if data.aliases then
discrepancy(modname,
"%s has an <code>aliases</code> key in [[Module:%s]]. This must be moved to [[Module:%s]].",
link(obj), modname, extra_modname
)
end
if data.varieties then
discrepancy(modname,
"%s has the <code>varieties</code> key in [[Module:%s]]. This must be moved to [[Module:%s]].",
link(obj), modname, extra_modname
)
end
if data.otherNames then
discrepancy(modname,
"%s has the <code>otherNames</code> key in [[Module:%s]]. This must be moved to [[Module:%s]].",
link(obj), modname, extra_modname
)
end
if not extra_data then
discrepancy(extra_modname,
"%s has data in [[Module:%s]], but does not have corresponding data in [[Module:%s]].",
link(obj), modname, extra_modname
)
--[[elseif extra_data.otherNames then
discrepancy(extra_modname,
"%s has <code>otherNames</code> key, but these should be changed to either <code>aliases</code> or <code>varieties</code>.",
link(obj)
)]]
end
local sc = data[4]
if sc then
if type(sc) == "string" then
sc = split(sc, "%s*,%s*", true)
end
if type(sc) == "table" then
if not sc[1] then
discrepancy(modname,
"%s has no scripts listed.",
link(obj)
)
else
for _, sccode in ipairs(sc) do
local cur_sc = m_scripts_data[sccode]
if not (cur_sc or sccode == "All" or sccode == "Hants") then
discrepancy(modname,
"%s lists the invalid script code <code>%s</code>.",
link(obj), dump(sccode)
)
--[[elseif not cur_sc.characters then
discrepancy(modname,
"%s lists the %s, which does not have any characters.",
link(obj), link(get_script_by_code(sccode))
)]]
end
nonempty_scripts[sccode] = true
end
end
else
discrepancy(modname,
"The %s field for %s must be a table or string.",
4, link(obj)
)
end
end
if data.ancestors then
check_ancestors(modname, obj, data)
end
if data[3] then
local family = data[3]
if not m_families_data[family] then
discrepancy(modname,
"%s has the invalid family code <code>%s</code>.",
link(obj), dump(family)
)
end
nonempty_families[family] = true
end
if data.sort_key then
check_entry_name_sortkey_display(modname, obj, data, "sort_key")
end
if data.entry_name then
check_entry_name_sortkey_display(modname, obj, data, "entry_name")
end
if data.display then
check_entry_name_sortkey_display(modname, obj, data, "display")
end
if data.standardChars then
if type(data.standardChars) == "table" then
local sccodes = {}
for _, sccode in ipairs(sc) do
sccodes[sccode] = true
end
for sccode in pairs(data.standardChars) do
if not (sccodes[sccode] or sccode == 1) then
discrepancy(modname,
"The field %s in the <code>standardChars</code> table for %s does not match any script for that language.",
sccode, link(obj)
)
end
end
elseif data.standardChars and type(data.standardChars) ~= "string" then
discrepancy(modname,
"The <code>standardChars</code> field in the data table for %s must be a string or table.",
link(obj)
)
end
end
check_true_or_string_or_nil(modname, obj, data, "override_translit")
check_true_or_string_or_nil(modname, obj, data, "link_tr")
if data.override_translit and not data.translit then
discrepancy(modname,
"%s has the <code>override_translit</code> field set, but no transliteration module",
link(obj)
)
end
end
local function check_module(modname)
local mod_data = load_data("Module:" .. modname)
local extra_modname = modname .. "/extra"
local extra_mod_data = load_data("Module:" .. extra_modname)
for code, data in pairs(mod_data) do
check_language(modname, code, data, extra_modname, extra_mod_data[code])
end
check_no_alias_codes(modname, mod_data)
check_no_alias_codes(extra_modname, extra_mod_data)
check_extraneous_extra_data(modname, mod_data, extra_modname, extra_mod_data)
end
-- Check two-letter codes
check_module(
"languages/data/2"
)
-- Check three-letter codes
for i = 0x61, 0x7A do -- a to z
check_module(
format("languages/data/3/%c", i)
)
end
-- Check exceptional codes
check_module(
"languages/data/exceptional"
)
-- These checks must be done while all_codes only contains language codes:
-- that is, after language data modules have been processed, but before
-- etymology languages, families, and scripts have.
check_code_to_name_and_name_to_code_maps(
"languages",
"a submodule of [[Module:languages]]",
all_codes, language_names,
"languages/code to canonical name", m_languages_codes,
"languages/canonical names", m_languages_canonical_names
)
--[=[
-- Check [[Template:langname-lite]]
local modname = "Template:langname-lite"
for code, name in gmatch(remove_comments(new_title(modname):getContent()), "\n\t*|#*([^\n]+)=([^\n]*)") do
if #code > 1 and code ~= "default" then
for _, code in pairs(split(code, "|", true)) do
local lang = get_language_by_code(code, nil, true, true)
if match(name, "etymcode") then
local nonEtym_name = frame:preprocess(name)
local nonEtym_real_name = lang:getFullName()
if nonEtym_name ~= nonEtym_real_name then
discrepancy(modname,
"Code: <code>%s</code>. Saw name: %s. Expected name: %s.",
code, nonEtym_name, nonEtym_real_name
)
end
name = frame:preprocess(gsub(name, "{{{allow etym|}}}", "1"))
elseif match(name, "familycode") then
name = match(name, "familycode|(.-)|")
else
name = name
end
if not lang then
discrepancy(modname,
"Code: <code>%s</code>. Saw name: %s. Language not present in data.",
code, name
)
else
local real_name = lang:getCanonicalName()
if name ~= real_name then
discrepancy(modname,
"Code: <code>%s</code>. Saw name: %s. Expected name: %s.",
code, name, real_name
)
end
end
end
end
end
--]=]
end
local function check_etym_languages()
local modname = "etymology languages/data"
local check_etymology_language_data_keys = check_data_keys(
1, 2, 3, 4, -- canonical name, Wikidata item, family, scripts
"parent", "display_text", "generate_forms", "entry_name", "sort_key",
"otherNames", "aliases", "varieties", "ietf_subtag",
"type", "main_code", "ancestors",
"wikimedia_codes", "wikipedia_article", "standardChars",
"translit", "override_translit", "link_tr",
"dotted_dotless_i"
)
local checked = {}
for code, data in pairs(m_etym_languages_data) do
local obj, canonical_name, parent = make_lang(code, data, true), data[1], data.parent
check_etymology_language_data_keys(modname, obj, data)
if all_codes[code] then
discrepancy(modname,
"The code <code>%s</code> is not unique; it is also defined in [[Module:%s]].",
code, all_codes[code]
)
else
if not m_etym_languages_codes[code] then
discrepancy("etymology languages/code to canonical name",
"The code %s is missing.",
link(obj, true)
)
end
all_codes[code] = modname
end
if not canonical_name then
discrepancy(modname,
"The code <code>%s</code> has no canonical name specified.",
code
)
elseif language_names[canonical_name] then
local canonical_lang = get_language_by_canonical_name(canonical_name, nil, true)
if not canonical_lang then
discrepancy(modname,
"%s has a canonical name that cannot be looked up.",
link(obj)
)
elseif data.main_code ~= canonical_lang:getCode() then
discrepancy(modname,
"%s has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(obj), language_names[canonical_name]
)
end
else
if not m_etym_languages_canonical_names[canonical_name] then
discrepancy("etymology languages/canonical names",
"The canonical name %s is missing.",
link(obj)
)
end
etym_language_names[canonical_name] = code
end
check_other_names_aliases_varieties(modname, obj, data, canonical_name)
if parent then
if type(parent) ~= "string" then
discrepancy(modname,
"%s has a parent code that is %s rather than a string.",
link(obj), parent == nil and "nil" or "a " .. type(parent)
)
elseif not (m_languages_data_all[parent] or m_families_data[parent] or m_etym_languages_data[parent]) then
discrepancy(modname,
"%s has the invalid parent code <code>%s</code>.",
link(obj), dump(parent)
)
end
nonempty_families[parent] = true
else
discrepancy(modname,
"%s has no parent code.",
link(obj)
)
end
if data.ancestors then
check_ancestors(modname, obj, data)
end
if data[3] then
local family = data[3]
if not m_families_data[family] then
discrepancy(modname,
"%s has the invalid family code <code>%s</code>.",
link(obj), dump(family))
end
nonempty_families[family] = true
end
check_wikidata_item(modname, obj, data, 2)
local stack = {}
while data do
if checked[code] then
break
elseif stack[code] then
local parent = data.parent
discrepancy(modname,
"%s has a cyclic parental relationship to %s",
link(make_lang(code, data, true)),
link(get_language_by_code(parent, nil, true))
)
break
end
stack[code] = true
code = data.parent
data = m_etym_languages_data[code]
end
for code in pairs(stack) do
checked[code] = true
end
end
check_no_alias_codes(modname, m_etym_languages_data)
check_code_to_name_and_name_to_code_maps(
"etymology languages",
"[[Module:etymology languages/data]]",
all_codes, etym_language_names,
"etymology languages/code to canonical name", m_etym_languages_codes,
"etymology languages/canonical names", m_etym_languages_canonical_names)
end
-- TODO: add collision check between the canonical names "X" and "X [Ll]anguages".
local function check_families()
local modname = "families/data"
local check_family_data_keys = check_data_keys(
1, 2, 3, -- canonical name, Wikidata item, (parent) family
"type", "ietf_subtag",
"protoLanguage", "otherNames", "aliases", "varieties"
)
local checked, double_check_if_empty = {["qfa-not"] = true}, {}
for code, data in pairs(m_families_data) do
local obj, canonical_name, family, protolang = make_family(code, data), data[1], data[3], data.protoLanguage
check_family_data_keys(modname, obj, data)
if all_codes[code] then
discrepancy(modname,
"The code <code>%s</code> is not unique; it is also defined in [[Module:%s]].",
code, all_codes[code]
)
else
if not m_families_codes[code] then
discrepancy("families/code to canonical name",
"The code %s is missing.",
link(obj, true)
)
end
all_codes[code] = modname
end
if not canonical_name then
discrepancy(modname,
"The code <code>%s</code> has no canonical name specified.",
code
)
elseif family_names[canonical_name] then
local canonical_family = get_family_by_canonical_name(canonical_name)
if not canonical_family then
discrepancy(modname,
"%s has a canonical name that cannot be looked up.",
link(obj)
)
elseif data.main_code ~= canonical_family:getCode() then
discrepancy(modname,
"%s has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(obj), family_names[canonical_name]
)
end
else
if not m_families_canonical_names[canonical_name] then
discrepancy("families/canonical names",
"The canonical name %s is missing.",
link(obj)
)
end
family_names[canonical_name] = code
end
check_other_names_aliases_varieties(modname, obj, data, canonical_name)
if family then
if family == code and code ~= "qfa-not" then
discrepancy(modname,
"%s has itself as its family.",
link(obj)
)
elseif not m_families_data[family] then
discrepancy(modname,
"%s has the invalid parent family code <code>%s</code>.",
link(obj), dump(family)
)
end
nonempty_families[family] = true
end
if protolang then
local protolang_obj = get_language_by_code(protolang, nil, true)
if not protolang_obj then
discrepancy(modname,
"%s has the invalid proto-language code <code>%s</code>.",
link(obj), dump(protolang)
)
elseif protolang == code .. "-pro" then
discrepancy(modname,
"%s has %s listed as its proto-language, which is redundant, since it is determined to be the proto-language automatically.",
link(obj), link(protolang_obj)
)
elseif sub(protolang, -4) == "-pro" then
discrepancy(modname,
"%s has %s listed as its proto-language, which is supposed to be the proto-language for the family <code>%s</code>.", link(obj), link(protolang_obj), sub(protolang, 1, -5)
)
end
end
check_wikidata_item(modname, obj, data, 2)
-- Could be a false-positive if a child family occurs on a later
-- iteration, so set aside any that fail for a second check. This avoids
-- having to iterate through the whole list of families once
-- nonempty_families has been fully populated.
if not (nonempty_families[code] or allowed_empty_families[code]) then
double_check_if_empty[code] = obj
end
local stack = {}
while data do
if checked[code] then
break
elseif stack[code] then
local parent = data[3]
discrepancy(modname,
"%s has a cyclic familial relationship to %s",
link(make_family(code, data)),
link(get_family_by_code(parent))
)
break
end
stack[code] = true
code = data[3]
data = m_families_data[code]
end
for code in pairs(stack) do
checked[code] = true
end
end
-- Any languages set aside as candidates for having no children are checked
-- again, now that nonempty_families is definitely complete.
for code, obj in next, double_check_if_empty do
if not (nonempty_families[code] or allowed_empty_families[code]) then
discrepancy(modname,
"%s has no child families or languages.",
link(obj)
)
end
end
check_no_alias_codes(modname, m_families_data)
check_code_to_name_and_name_to_code_maps(
"families",
"[[Module:families/data]]",
all_codes, family_names,
"families/code to canonical name", m_families_codes,
"families/canonical names", m_families_canonical_names)
end
-- TODO: add collision check between the canonical names "X" and "X [Ss]cript".
local function check_scripts()
local modname = "scripts/data"
local check_script_data_keys = check_data_keys(
1, 2, 3, -- canonical name, Wikidata item, writing systems
"otherNames", "aliases", "varieties", "parent", "ietf_subtag", "type",
"wikipedia_article", "ranges", "characters", "spaces", "capitalized", "translit", "direction",
"character_category", "normalizationFixes", "sort_by_scraping"
)
-- Just to satisfy requirements of check_code_to_name_and_name_to_code_maps.
local script_code_to_module_map = {}
for code, data in pairs(m_scripts_data) do
local obj, canonical_name = make_script(code, data), data[1]
if not m_scripts_codes[code] and #code == 4 then
discrepancy("scripts/code to canonical name",
"The code %s is missing",
link(obj, true)
)
end
check_script_data_keys(modname, obj, data)
if not canonical_name then
discrepancy(modname,
"The code <code>%s</code> has no canonical name specified.",
code
)
elseif script_names[canonical_name] then
local canonical_script = get_script_by_canonical_name(canonical_name)
if not canonical_script then
discrepancy(modname,
"%s has a canonical name that cannot be looked up.",
link(obj)
)
--[[elseif data.main_code ~= canonical_script:getCode() then
discrepancy(modname,
"%s has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(obj), script_names[canonical_name]
)]]
end
else
if not m_scripts_canonical_names[canonical_name] and #code == 4 then
discrepancy("scripts/by name",
"The canonical name %s is missing.",
link(obj)
)
end
script_names[canonical_name] = code
end
check_other_names_aliases_varieties(modname, obj, data, canonical_name)
if not nonempty_scripts[code] then
discrepancy(modname,
"%s is not used by any language%s.",
link(obj), data.characters and ""
or " and has no characters listed for auto-detection")
--[[elseif not data.characters then
discrepancy(modname,
"%s has no characters listed for auto-detection.",
link(obj)
)--]]
end
if data.characters then
validate_pattern(data.characters, modname, obj, false)
end
check_wikidata_item(modname, obj, data, 2)
script_code_to_module_map[code] = modname
end
check_no_alias_codes(modname, m_scripts_data)
check_code_to_name_and_name_to_code_maps(
"scripts",
"a submodule of [[Module:scripts]]",
script_code_to_module_map, script_names,
"scripts/code to canonical name", m_scripts_codes,
"scripts/by name", m_scripts_canonical_names)
end
-- FIXME: this is quite messy.
local function check_wikidata_languages()
local data = json_decode(new_title("Module:languages/data/wikidata.json"):getContent())
local seen = {{}, {}, {}, [5] = {}}
for _, item in ipairs(data) do
local id = item.id
for k, v in pairs(item) do
if k ~= "id" then
local _seen = seen[k]
for _, code in ipairs(v) do
local _code = code[1]
local _type = type(_seen[_code])
if _type == "table" then
insert(_seen[_code], id)
elseif _type == "string" then
_seen[_code] = {_seen[_code], id}
else
_seen[_code] = id
end
end
end
end
end
local modname = "languages/data/wikidata.json"
for k, v in pairs(seen) do
for code, ids in pairs(v) do
if type(ids) == "table" then
local t = {}
for i, id in ipairs(ids) do
t[i] = format("<code>[[d:%s|%s]]</code>", id, id)
end
discrepancy(modname,
"<code>%s</code> is set as an ISO 639-%d code on multiple items: %s.",
code, k, list_to_text(t)
)
end
end
end
end
local function check_labels()
local check_label_data_keys = check_data_keys(
"display", "Wikipedia", "glossary",
"plain_categories", "topical_categories", "pos_categories", "regional_categories", "sense_categories",
"omit_preComma", "omit_postComma", "omit_preSpace",
"deprecated", "track"
)
local function check_label(modname, code, data)
local _type = type(data)
if _type == "table" then
check_label_data_keys(modname, code, data)
elseif _type ~= "string" then
discrepancy(modname,
"The data for the label <code>%s</code> is %s %s; only tables and strings are allowed.",
code, add_indefinite_article(_type)
)
end
end
for _, module in ipairs{"", "/regional", "/topical"} do
local modname = "Module:labels/data" .. module
module = require(modname)
for label, data in pairs(module) do
check_label(modname, label, data)
end
end
for code in pairs(m_languages_codes) do
local modname = "Module:labels/data/lang/" .. code
local module = safe_require(modname)
if module then
for label, data in pairs(module) do
check_label(modname, label, data)
end
end
end
end
local function check_zh_trad_simp()
local m_ts = require("Module:zh/data/ts")
local m_st = require("Module:zh/data/st")
local ruby = require("Module:ja-ruby").ruby_auto
local lang = get_language_by_code("zh")
local Hant = get_script_by_code("Hant")
local Hans = get_script_by_code("Hans")
local data = {[0] = m_st, m_ts}
local mod = {[0] = "st", "ts"}
local var = {[0] = "Simp.", "Trad."}
local sc = {[0] = Hans, Hant}
local function find_stable_loop(chars, other, j)
local display = ruby({["markup"] = "[" .. other .. "](" .. var[(j+1)%2] .. ")"})
display = language_link{term = other, alt = display, lang = lang, sc = sc[(j+1)%2], tr = "-"}
insert(chars, display)
if data[(j+1)%2][other] == other then
insert(chars, other)
return chars, 1
elseif not data[(j+1)%2][other] then
insert(chars, "not found")
return chars, 2
elseif data[j%2][data[(j+1)%2][other]] ~= other then
return find_stable_loop(chars, data[(j+1)%2][other], j + 1)
else
local display = ruby({["markup"] = "[" .. data[(j+1)%2][other] .. "](" .. var[j%2] .. ")"})
display = language_link{term = data[(j+1)%2][other], alt = display, lang = lang, sc = sc[j%2], tr = "-"}
insert(chars, display .. " (")
display = ruby({["markup"] = "[" .. data[j%2][data[(j+1)%2][other]] .. "](" .. var[(j+1)%2] .. ")"})
display = language_link{term = data[j%2][data[(j+1)%2][other]], alt = display, lang = lang, sc = sc[(j+1)%2], tr = "-"}
insert(chars, display .. " etc.)")
return chars, 3
end
return chars
end
for i = 0, 1, 1 do
for ch, other_ch in pairs(data[i]) do
if data[(i+1)%2][other_ch] ~= ch then
local chars, issue = {}
local display = ruby({["markup"] = "[" .. ch .. "](" .. var[i] .. ")"})
display = language_link{term = ch, alt = display, lang = lang, sc = sc[i], tr = "-"}
insert(chars, display)
chars, issue = find_stable_loop(chars, other_ch, i)
if issue == 1 or issue == 2 then
local sc_this, mod_this, j = {}
if match(chars[#chars-1], var[(i+1)%2]) then
j = 1
else
j = 0
end
mod_this = mod[(i+j)%2]
sc_this = {[0] = sc[(i+j)%2], sc[(i+j+1)%2]}
for k, ch in ipairs(chars) do
chars[k] = tag_text(ch, lang, sc_this[k%2], "term")
end
local modname = "zh/data/" .. mod_this
if issue == 1 then
discrepancy(modname,
"character references itself: %s",
concat(chars, " → ")
)
elseif issue == 2 then
discrepancy(modname,
"missing character: %s",
concat(chars, " → ")
)
end
elseif issue == 3 then
for j, ch in ipairs(chars) do
chars[j] = tag_text(ch, lang, sc[(i+j)%2], "term")
end
discrepancy("zh/data/" .. mod[i],
"possible mismatched character: %s",
concat(chars, " → ")
)
end
end
end
end
end
local function check_serialization(modname)
local serializers = {
["Hani-sortkey/data/serialized"] = "Hani-sortkey/serializer",
}
if not serializers[modname] then
return nil
end
local serializer = serializers[modname]
local current_data = require("Module:" .. serializer).main(true)
local stored_data = require("Module:" .. modname)
if current_data ~= stored_data then
discrepancy(modname,
"<strong><u>Important!</u> Serialized data is out of sync. Use [[Module:%s]] to update it. If you have made any changes to the underlying data, the serialized data <u>must</u> be updated before these changes will take effect.</strong>",
serializer
)
end
end
local find_code = require("Module:memoize")(function(message)
return match(message, "<code>([^<]+)</code>")
end)
local function compare_messages(message1, message2)
local code1, code2 = find_code(message1), find_code(message2)
if code1 and code2 then
return code1 < code2
else
return message1 < message2
end
end
-- Warning: cannot be called twice in the same module invocation because
-- some module-global variables are not reset between calls.
local function do_checks(frame, modules)
messages = setmetatable({}, messages_mt)
if modules["zh/data/ts"] or modules["zh/data/st"] then
check_zh_trad_simp()
end
check_languages(frame)
check_etym_languages()
-- families and scripts must be checked AFTER languages; languages checks fill out
-- the nonempty_families and nonempty_scripts tables, used for testing if a family/script
-- is ever used in the data
check_families()
check_scripts()
check_wikidata_languages()
if modules["labels/data"] then
check_labels()
end
for module in pairs(modules) do
check_serialization(module)
end
setmetatable(messages, nil)
for _, msglist in pairs(messages) do
msglist:sort(compare_messages)
end
local ret = messages
messages = nil
return ret
end
local function format_message(modname, msglist)
local header; if match(modname, "^Module:") or match(modname, "^Template:") then
header = "===[[" .. modname .. "]]==="
else
header = "===[[Module:" .. modname .. "]]==="
end
return header .. msglist:map(function(msg)
return "\n* " .. msg
end):concat()
end
function export.check_modules_t(frame)
local args = frame.args
local modules = list_to_set(args)
local ret = Array()
local messages = do_checks(frame, modules)
for _, module in ipairs(args) do
local msglist = messages[module]
if msglist then
ret:insert(format_message(module, msglist))
end
end
return ret:concat("\n")
end
function export.perform(frame)
local messages = do_checks(frame, {})
-- Format the messages
local ret = Array()
for modname, msglist in sorted_pairs(messages) do
ret:insert(format_message(modname, msglist))
end
-- Are there any messages?
-- TODO: check how many messages there are.
if false then --if i == 1 then
return "<b class=\"success\">Glory to Arstotzka.</b>"
else
ret:insert(1, "<b class=\"warning\">Discrepancies detected:</b>")
return ret:concat("\n")
end
end
return export