ในฐานะพจนานุกรมนานาชาติ วิกิพจนานุกรมมีวัตถุประสงค์เพื่อรวม “ทุกคำในทุกภาษา” ขึ้นอยู่กับเกณฑ์ดังต่อไปนี้

เว็บย่อ:
WT:CFI

กฎทั่วไป

แก้ไข

คำศัพท์ควรถูกรวมไว้ หากเป็นไปได้ว่ามีใครสักคนผ่านเข้ามาและต้องการรู้ว่ามันหมายถึงอะไร สิ่งนี้จะนำไปสู่แนวทางที่เป็นทางการมากขึ้น ที่จะรวมคำศัพท์ที่ได้พิสูจน์ยืนยันแล้ว และพบว่ามันเป็นคำเดี่ยวหรือเป็นสำนวน

คำศัพท์

แก้ไข

คำศัพท์ไม่จำเป็นต้องจำกัดว่าเป็นเพียงคำเดี่ยวในความหมายปกติ สิ่งเหล่านี้ยังเป็นที่ยอมรับ:

การพิสูจน์ยืนยัน

แก้ไข
เว็บย่อ:
WT:ATTEST

“พิสูจน์ยืนยันแล้ว” หมายความว่าได้รับการพิสูจน์ยืนยันโดย

  1. มีการใช้งานในวงกว้าง หรือ
  2. มีการใช้งานในสื่อบันทึกถาวร และสื่อถึงความหมาย อย่างน้อย 3 แหล่งที่เป็นอิสระต่อกัน และมีอายุอย่างน้อย 1 ปี (เฉพาะบางภาษาอาจมีความต้องการที่ต่างออกไป)

ถ้าเป็นไปได้ จะเป็นการดีกว่าหากอ้างอิงแหล่งข้อมูลที่น่าจะยังคงเข้าถึงได้ง่ายเมื่อเวลาผ่านไป ดังนั้นใครบางคนที่อ้างถึงวิกิพจนานุกรมหลายปีนับจากนี้น่าจะสามารถค้นหาแหล่งต้นฉบับได้ เนื่องจากวิกิพจนานุกรมเป็นพจนานุกรมออนไลน์ สิ่งนี้จึงเป็นที่โปรดปรานของสื่อเช่นกลุ่มยูสเน็ต ซึ่งกูเกิลได้เก็บข้อมูลไว้อย่างถาวร สื่อสิ่งพิมพ์เช่นหนังสือและนิตยสารก็ใช้ได้เช่นกัน โดยเฉพาะหากเนื้อหาเหล่านั้นถูกจัดทำดัชนีออนไลน์ สื่อบันทึกอื่น ๆ เช่นเสียงและวิดีโอก็เป็นที่ยอมรับเช่นกัน หากสื่อเหล่านั้นมีต้นกำเนิดที่ตรวจสอบได้และถูกเก็บถาวรอย่างคงทน เรา ไม่ อ้างถึงเว็บไซต์มีเดียวิกิอื่น ๆ (เช่นวิกิพีเดีย) แต่เราอาจใช้ข้อความที่ยกมาที่พบในนั้นได้ (เช่นการอ้างอิงจากหนังสือที่มีอยู่ในวิกิซอร์ซ) เมื่ออ้างอิงข้อความที่ยกมาจากหนังสือ โปรดระบุ ISBN ด้วย

Conveying meaning

แก้ไข
See use-mention distinction.

This filters out appearance in raw word lists, commentary on the form of a word, such as “The word ‘foo’ has three letters,” lone definitions, and made-up examples of how a word might be used. For example, an appearance in someone’s online dictionary is suggestive, but it does not show the word actually used to convey meaning. On the other hand, a sentence like “They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind,” appearing in an account of a sailboat race, would be fine. It happens to contain a definition, but the word is also used for its meaning.

Number of citations

แก้ไข

For languages well documented on the Internet, three citations in which a term is used is the minimum number for inclusion in Wiktionary. For terms in extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate subject to the below requirements. For all other spoken languages that are living, only one use or mention is adequate, subject to the following requirements:

  • the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
  • each entry should have its source(s) listed on the entry or citation page, and
  • a box explaining that a low number of citations were used should be included on the entry page (such as by using the {{LDL}} template).[1]

This serves to prevent double-counting of usages that are not truly distinct. Roughly speaking, we generally consider two uses of a term to be "independent" if they are in different sentences by different people, and to be non-independent if:

  • one is a verbatim or near-verbatim quotation of the other; or
  • both are verbatim or near-verbatim quotations or translations of a single original source; or
  • both are by the same author.

If two or more usages are not independent of each other, then only one of them can be used for purposes of attestation.[2]

Spanning at least a year

แก้ไข

This is meant to filter out words that may appear and see brief use, but then never be used again. The one-year threshold is somewhat arbitrary, but appears to work well in practice.

Idiomaticity

แก้ไข
เว็บย่อ:
WT:SOP

An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components. Non-idiomatic expressions are called sum-of-parts (SOP).

For example, this is a door is not idiomatic, but shut up and red herring are.

This criterion is sometimes referred to as the fried egg test, as a fried egg generally means an egg (and generally a chicken egg or similar) fried in a particular way. It generally doesn't denote a scrambled egg, which may nonetheless be cooked by frying.

See Wiktionary:Idioms that survived RFD for other examples. However, many idioms are clearly idiomatic, for example red herring. These tests are invoked only in discussion of unclear cases.

Phrasebook entries are very common expressions that are considered useful to non-native speakers. Although these are included as entries in the dictionary (in the main namespace), they are not usually considered in these terms. For instance, What's your name? is clearly a summation of its parts.

Unidiomatic terms made up of multiple words are included if they are significantly more common than single-word spellings that meet criteria for inclusion; for example, coalmine meets criteria for inclusion, so its more common form coal mine is also included.[3][4]

An attested integer word (such as twenty-three or twenty-third) or a decimal numeral (sequence of 0, ..., 9 digits) that is ≥ 0 and ≤ 100 should be kept even if it is not idiomatic. In sequences of digits such as 125, the digits are considered to be separate components for the purpose of idiomaticity, and therefore, the sequences are often not idiomatic.[5]

In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination of editors that inclusion of the term is likely to be useful to readers.

Idiomaticity rules apply to hyphenated compounds in the same way as to spaced phrases. For example, wine-lover, green-haired, harsh-sounding and ex-teacher are all excluded as they mean no more than the sum of their parts, while green-fingered and good-looking are included as idiomatic.

Translation hubs

แก้ไข
เว็บย่อ:
WT:THUB

A translation hub (translation target) is a common English multi-word term or collocation that is useful for hosting translations. Some attested translation hubs should be included despite being non-idiomatic and some excluded, but there is no agreement on precise, all-encompassing rules for deciding which are which. Therefore, the following criteria for inclusion of attested non-idiomatic translation hubs are tentative:[6]

  • The attested English term has to be common; rare terms don't qualify.
  • A translation does not qualify to support the English term if it is:
    • a closed compound that is a word-for-word translation of the English term: German Autoschlüssel does not qualify to support the English "car key"; or
    • a multi-word phrase that is a word-for-word translation of the English term; or
    • a diminutive: Spanish mecedorcito does not qualify to support the English "small rocking chair"; or
    • an augmentative: Portuguese amigão does not qualify to support the English "good friend"; or
    • a comparative or a superlative; or
    • a phrase in a language that does not use spaces to separate words.
  • At the very least, two qualifying translations must support the English term. Editor judgment can require a higher number, on a case-by-case basis.
  • The existence of a rare single-word English synonym of the considered English term does not disqualify the considered English term: the existence of Anglistics, which is rare, does not disqualify English studies.

Numbers, numerals, and ordinals

แก้ไข

Numbers, numerals, and ordinals over 100 that are not single words or are sequences of digits should not be included in the dictionary, unless the number, numeral, or ordinal in question has a separate idiomatic sense that meets the CFI.[5]

Misspellings, common misspellings and variant spellings:[7] Rare misspellings should be excluded while common misspellings should be included.[8] There is no simple hard and fast rule, particularly in English, for determining whether a particular spelling is “correct”. Published grammars and style guides can be useful in that regard, as can statistics concerning the prevalence of various forms.

Most simple typos are much rarer than the most frequent spellings. Some words, however, are frequently misspelled. For example, occurred is often spelled with only one c or only one r, but only occurred is considered correct.

It is important to remember that most languages, including English, do not have an academy to establish rules of usage, and thus may be prone to uncertain spellings. This problem is less frequent, though not unknown, in languages such as Spanish where spelling may have legal support in some countries.

Regional or historical variations are not misspellings. For example, there are well-known differences between British and American spelling. A spelling considered incorrect in one region may not occur at all in another, and may even dominate in yet another.

Combining characters (like this) should exist as main-namespace redirects to their non-combining forms (like this) if the latter exist.[9]

The entries for such inflected forms as cameras, geese, asked, and were should indicate what form they are, and link to the main entry for the word (camera, goose, ask, or be, respectively, for the preceding examples). Except with multi-word idioms, they should not merely redirect.

At entries for inflected forms with idiomatic senses, such as blues and smitten, predictable meanings should be distinguished from idiomatic ones.[10]

Attested repetitive words formed by repeating letters or syllables in other attested words for emphasis, and having no other meaning in any language shall be treated as follows:[11]

  1. Each attested repetitive form that has no more than three repetitions shall have an entry.
  2. Each attested repetitive form that has more than three repetitions shall be hard-redirected to the entry having three repetitions. The three-repetition entry shall have a usage note indicating that additional instances of the letter or syllable may be added for the purely literary effect of indicating emphasis.

An example of repeated letters is the repeated "e" in "pleeease" and "pleeeeeease" compared to "please".

An example of repeated syllables is the repeated "ha" in "hahahahaha" compared to "hahaha".

As for repetition counting: "hahaha" is considered to have three repetitions, while "haha" has two repetitions.

To hard-redirect is to use "#REDIRECT", which immediately takes the reader to the target page.

The above treatment may be overriden by consensus, for example where a variation having four repetitions is more common, or where an additional repetition would cause the word to shift to a different pronunciation or intonation.

Idiomatic phrases

แก้ไข

Many phrases take several forms. It is not necessary to include every conceivable variant. When present, minor variants should simply redirect to the main entry. For the main entry, prefer the most generic form, based on the following principles:

Prefer the generic personal pronoun one (one's) when an action is usually reflexive, and someone (someone's) when an action is usually not reflexive. Thus, feel one’s oats (preferable to feel his oats), breathe down someone's neck. Use of other personal pronouns, especially in the singular, should be avoided except where they are essential to the meaning.

Omit an initial article unless it makes a difference in the meaning. E.g., cat’s pajamas instead of the cat’s pajamas.

Use the infinitive form of the verb (but without “to”) for the principal verb of a verbal phrase. Thus for the saying It’s raining cats and dogs, or It was raining cats and dogs, or I think it’s going to rain cats and dogs any minute now, or It’s rained cats and dogs for the last week solid the entry should be (and is) under rain cats and dogs. The other variants are derived by the usual rules of grammar (including the use of it with weather terms and other impersonal verbs).

A proverb entry's title begins with a lowercase letter, whether it is a full sentence or not. The first word may still be capitalized on its own:

Languages to include

แก้ไข

Natural languages

แก้ไข

All natural languages are acceptable. However, it is important to note that the question of whether a proposed language is considered a living language or a dialect of or alternate name for another language is inherently subjective in some cases, and either designation may have political overtones.

Sign languages

แก้ไข

Terms in signed languages are acceptable as entries, and should be entered as described in the policy document Wiktionary:About sign languages.[12]

Constructed languages

แก้ไข

Constructed languages have not developed naturally, but are the product of conscious effort in the fulfillment of some purpose.

All constructed languages are excluded except for Esperanto, Ido, Interlingua, Interlingue (Occidental), Novial, and Volapük.

Some individual terms from constructed languages have been adopted into other languages. These should be treated as terms in the adoptive language, and the origin noted in the etymology, regardless of whether the constructed language as a whole is included.

Languages originating from literary works should not be included as entries or translations in the main namespace, consistent with the above. However, the following constructed languages should have lexicons in the Appendix namespace: Quenya, Sindarin, Klingon, Orcish[13][14] and Lojban.[15]

Reconstructed languages

แก้ไข

Terms in reconstructed languages such as Proto-Indo-European do not meet the criteria for inclusion. They may be entered in the Reconstruction namespace, and are referred to from etymology sections.[16] See Wiktionary:Reconstructed terms.

Fictional universes

แก้ไข
เว็บย่อ:
WT:FICTION

Terms originating in fictional universes which have three citations in separate works, but which do not have three citations which are independent of reference to that universe may be included only in appendices of words from that universe, and not in the main dictionary space.[17] With respect to names of persons or places from fictional universes, they shall not be included unless they are used out of context in an attributive sense. See examples.

For purposes of defining a single work, a series of books, films, or television episodes by the same author, documenting the exploits of a common set of characters in a fictional universe (e.g. the Harry Potter books, Tolkien's Middle Earth books, the Star Wars films), shall be considered a single work in multiple parts.

The vote "pl-2010-10/Disallowing certain appendices" is relevant to this section, without specifying text to be amended in this document, so please see it for details.[18]

Wiktionary is not an encyclopedia

แก้ไข

See also Wiktionary is not an encyclopaedia.

Care should be taken so that entries do not become encyclopedic in nature; if this happens, such content should be moved to Wikipedia, but the dictionary entry itself should be kept.

Wiktionary articles are about words, not about people or places. Articles about the specific places and people belong in Wikipedia.

Sarcastic usage

แก้ไข

The straightforward sarcastic use of irony, understatement and hyperbole does not usually qualify for inclusion. This means, for example, that big should not be defined as “(ironic) small”, “(understatement) gigantic” or “(hyperbole) moderately large”. Common rhetorical use can be explained in a usage note, a context tag (such as (usually sarcastic)) or as part of the literal definition. Terms which are seldom or never used literally (e.g. fat chance) are not covered by this rule, and can be included on their own merits.[19]

Language-specific issues

แก้ไข

Individual languages may have additional restrictions on inclusion. These will be mentioned on that language's About page. For instance, Wiktionary:English entry guidelines notes that the community has voted[20] to not allow most modern English possessives.[21]

Names fall into several categories, including company names, the names of products, given names, family names, and the full names of specific people, places, and things. Wiktionary classifies all as proper nouns, but applies caveats to each.

Generic terms are common rather than proper nouns. For example: Remington is used as a synonym for any sort of rifle, and Hoover as a synonym for any sort of vacuum cleaner. (Both are also attested family name words, and are included on that basis as well, of course.) Hamburger is used as generic term for a type of sandwich. One good rule of thumb as to whether a name has become a generic word is whether the word can be used without capitalization (as indeed “sandwich” was in the previous sentence).

เว็บย่อ:
WT:BRAND

A brand name for a product or service should be included if it has entered the lexicon.[22][23] Apart from genericized trademarks, this is measured objectively by the brand name’s use in at least three independent durably archived citations spanning a period of at least three years. The sources of these citations:

  1. must be independent of any parties with economic interest in the brand, including the manufacturer, distributors, retailers, marketers, and advertisers, their parent companies, subsidiaries, and affiliates, at time of authorship; and
  2. must not identify any such parties.

If the term has legal protection as a trademark, the original source must not indicate such. The sources also must not be written:

  1. by any person or group associated with the type of product or service;
  2. about any person or group specifically associated with the product or service; or
  3. about the type of product or service in general.

The text preceding and surrounding the citation must not identify the product or service to which the brand name applies, whether by stating explicitly or implicitly some feature or use of the product or service from which its type and purpose may be surmised, or some inherent quality that is necessary for an understanding of the author’s intent. See examples.

Given and family names

แก้ไข

Given names (such as David, Roger, and Peter) and family names (such as Baker, Bush, Rice, Smith, and Jones) are words, and subject to the same criteria for inclusion as any other words. Wiktionary has main articles giving etymologies, alternative spellings, meanings, and translations for given names and family names, and has two appendices for indexing those articles: Appendix:Names, Appendix:Surnames/A.[24]

For most given names and family names, it is relatively easy to demonstrate that the word fulfills the criteria, as for most given names and family names the name words are in widespread use in both spoken communication and literature. However, being a name per se does not automatically qualify a word for inclusion. A new name, that has not been attested, is still a protologism. A name that occurs only in the works of fiction of a single author, a television series or a video game, or within a closed context such as the works of several authors writing about a single fictional universe, is not used independently and should not be included.

Hypocoristics, diminutives, and abbreviations of names (such as Jock, Misha, Kenny, Ken, and Rog) are held to the same standards as names.

Genealogical content

แก้ไข

Wiktionary is not a genealogy database. Wiktionary articles on family names, for example, are not intended to be about the people who share the family name. They are about the name as a word. For example: Whilst Yoder will tell the reader that the word originated in Switzerland (as well as give its pronunciations and alternative spellings), it is not intended to include information about the ancestries of people who have the family name Yoder.[25]

Names of specific entities

แก้ไข

This section regulates the inclusion and exclusion of names of specific entities, that is, names of individual people, names of geographic features, names of celestial objects, names of mythological creatures, names and titles of various works, etc.[26][27] Examples include the Internet, the Magna Carta, the Mona Lisa, the Qur'an, the Red Cross, the Titanic, and World War II.

A name of a specific entity must not be included if it does not meet the attestation requirement. Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which. However, policies exist for names of certain kinds of entities. In particular:

  • No individual person should be listed as a sense in any entry whose page title includes both a given name or diminutive and a family name or patronymic. For instance, Walter Elias Disney, the film producer and voice of Mickey Mouse, is not allowed a definition line at Walt Disney.
  • Names of specific companies are subject to the “Company names” section of this page.
  • Names of fictional people and places are subject to the “Fictional universes” section of this page.

Place names are subject to the “Place names” section of this page.

Such definitions as are included should be succinct rather than encyclopedic.

Company names

แก้ไข

Being a company name does not guarantee inclusion. To be included, the use of the company name other than its use as a trademark (i.e., a use as a common word or family name) has to be attested.

The following place names should be included as long as they are attested:

  • The names of continents.
  • The names of seas and oceans.
  • The names of countries.
  • The names of areas or regions containing multiple countries (e.g., Middle East, Eurozone).
  • The names of primary administrative divisions (states, provinces, counties, etc).
  • The names of conurbations, cities, towns, villages and hamlets.
  • Districts of towns and cities (e.g., Fulham).
  • The names of inhabited islands and archipelagos.
  • The names of other significant natural geographic features (such as large deserts and major rivers).

The editors have not yet reached a consensus as to whether or not the names of places and geographic features other than those listed above should be included in Wiktionary. There is currently no definition of "significant natural geographic features", but by way of an example, the twenty largest lakes in the world by surface area would each qualify. It is hoped that the editors will develop criteria over time to provide greater clarity and address matters not currently covered (for example the names of streets, buildings, tunnels).[28][29]

Issues to consider

แก้ไข

Attestation vs. the slippery slope

แก้ไข

There is occasionally concern that adding an entry for a particular term will lead to entries for a large number of similar terms. This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form.[30]

  1. Wiktionary:Votes/2012-06/Well Documented Languages
  2. Wiktionary:Votes/pl-2012-02/Independence
  3. (WT:COALMINE) Wiktionary:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word
  4. Wiktionary:Votes/2019-08/Rescinding the "Coalmine" policy
  5. 5.0 5.1 Wiktionary:Votes/pl-2017-05/Numbers, numerals, and ordinals
  6. Wiktionary:Votes/pl-2018-03/Including translation hubs
  7. Wiktionary:Votes/pl-2011-02/Renaming CFI section for spellings
  8. Wiktionary:Votes/pl-2014-04/Keeping common misspellings
  9. Wiktionary:Votes/2011-06/Redirecting combining characters
  10. Wiktionary:Votes/pl-2008-08/Inclusion of regular inflected forms
  11. Wiktionary:Votes/2014-01/Treatment of repeating letters and syllables
  12. Wiktionary:Votes/pl-2008-08/Wiktionary:About sign languages
  13. Wiktionary:Votes/pl-2007-04/Fictional languages
  14. Wiktionary:Votes/pl-2017-05/Simplifying CFI about constructed languages
  15. Wiktionary:Votes/2018-02/Moving Lojban entries to the Appendix
  16. Wiktionary:Votes/2015-09/Creating a namespace for reconstructed terms
  17. Wiktionary:Votes/pl-2008-01/Appendices for fictional terms
  18. Wiktionary:Votes/pl-2010-10/Disallowing certain appendices
  19. Wiktionary:Votes/pl-2015-03/Excluding most sarcastic usage from CFI
  20. Wiktionary:Votes/pl-2007-07/Exclusion of possessive case
  21. Wiktionary:Votes/pl-2010-01/Removing Modern English possessive forms section from CFI
  22. Wiktionary:Votes/pl-2007-08/Brand names of products 2
  23. Wiktionary:Votes/pl-2012-02/Brand names and physical product 2
  24. Wiktionary:Votes/pl-2010-01/Renaming given name appendixes
  25. Wiktionary:Votes/pl-2010-01/Renaming CFI section on genealogic names
  26. Wiktionary:Votes/pl-2010-05/Names of specific entities
  27. Wiktionary:Votes/pl-2010-12/Names of individuals
  28. Wiktionary:Votes/pl-2017-01/Policy on place names
  29. Wiktionary:Votes/pl-2017-03/CFI and place names cleanup
  30. Wiktionary:Votes/pl-2016-02/Attestation vs. the slippery slope 2