Skip to content

Commit 9b140df

Browse files
committed
Checkstyle: Line is longer than 120 characters.
git-svn-id: https://svn.apache.org/repos/asf/commons/proper/codec/trunk@1634435 13f79535-47bb-0310-9956-ffa450edef68
1 parent 30c5463 commit 9b140df

1 file changed

Lines changed: 26 additions & 26 deletions

File tree

src/main/java/org/apache/commons/codec/language/bm/BeiderMorseEncoder.java

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,44 +23,44 @@
2323
/**
2424
* Encodes strings into their Beider-Morse phonetic encoding.
2525
* <p>
26-
* Beider-Morse phonetic encodings are optimised for family names. However, they may be useful for a wide range
27-
* of words.
26+
* Beider-Morse phonetic encodings are optimised for family names. However, they may be useful for a wide range of
27+
* words.
2828
* <p>
29-
* This encoder is intentionally mutable to allow dynamic configuration through bean properties. As such, it
30-
* is mutable, and may not be thread-safe. If you require a guaranteed thread-safe encoding then use
31-
* {@link PhoneticEngine} directly.
29+
* This encoder is intentionally mutable to allow dynamic configuration through bean properties. As such, it is mutable,
30+
* and may not be thread-safe. If you require a guaranteed thread-safe encoding then use {@link PhoneticEngine}
31+
* directly.
3232
* <p>
3333
* <b>Encoding overview</b>
3434
* <p>
3535
* Beider-Morse phonetic encodings is a multi-step process. Firstly, a table of rules is consulted to guess what
3636
* language the word comes from. For example, if it ends in "<code>ault</code>" then it infers that the word is French.
37-
* Next, the word is translated into a phonetic representation using a language-specific phonetics table. Some
38-
* runs of letters can be pronounced in multiple ways, and a single run of letters may be potentially broken up
39-
* into phonemes at different places, so this stage results in a set of possible language-specific phonetic
40-
* representations. Lastly, this language-specific phonetic representation is processed by a table of rules that
41-
* re-writes it phonetically taking into account systematic pronunciation differences between languages, to move
42-
* it towards a pan-indo-european phonetic representation. Again, sometimes there are multiple ways this could be
43-
* done and sometimes things that can be pronounced in several ways in the source language have only one way to
44-
* represent them in this average phonetic language, so the result is again a set of phonetic spellings.
37+
* Next, the word is translated into a phonetic representation using a language-specific phonetics table. Some runs of
38+
* letters can be pronounced in multiple ways, and a single run of letters may be potentially broken up into phonemes at
39+
* different places, so this stage results in a set of possible language-specific phonetic representations. Lastly, this
40+
* language-specific phonetic representation is processed by a table of rules that re-writes it phonetically taking into
41+
* account systematic pronunciation differences between languages, to move it towards a pan-indo-european phonetic
42+
* representation. Again, sometimes there are multiple ways this could be done and sometimes things that can be
43+
* pronounced in several ways in the source language have only one way to represent them in this average phonetic
44+
* language, so the result is again a set of phonetic spellings.
4545
* <p>
46-
* Some names are treated as having multiple parts. This can be due to two things. Firstly, they may be hyphenated.
47-
* In this case, each individual hyphenated word is encoded, and then these are combined end-to-end for the final
48-
* encoding. Secondly, some names have standard prefixes, for example, "<code>Mac/Mc</code>" in Scottish (English)
49-
* names. As sometimes it is ambiguous whether the prefix is intended or is an accident of the spelling, the word
50-
* is encoded once with the prefix and once without it. The resulting encoding contains one and then the other result.
46+
* Some names are treated as having multiple parts. This can be due to two things. Firstly, they may be hyphenated. In
47+
* this case, each individual hyphenated word is encoded, and then these are combined end-to-end for the final encoding.
48+
* Secondly, some names have standard prefixes, for example, "<code>Mac/Mc</code>" in Scottish (English) names. As
49+
* sometimes it is ambiguous whether the prefix is intended or is an accident of the spelling, the word is encoded once
50+
* with the prefix and once without it. The resulting encoding contains one and then the other result.
5151
* <p>
5252
* <b>Encoding format</b>
5353
* <p>
54-
* Individual phonetic spellings of an input word are represented in upper- and lower-case roman characters. Where
55-
* there are multiple possible phonetic representations, these are joined with a pipe (<code>|</code>) character.
56-
* If multiple hyphenated words where found, or if the word may contain a name prefix, each encoded word is placed
57-
* in elipses and these blocks are then joined with hyphens. For example, "<code>d'ortley</code>" has a possible
58-
* prefix. The form without prefix encodes to "<code>ortlaj|ortlej</code>", while the form with prefix encodes to
59-
* "<code>dortlaj|dortlej</code>". Thus, the full, combined encoding is "<code>(ortlaj|ortlej)-(dortlaj|dortlej)</code>".
54+
* Individual phonetic spellings of an input word are represented in upper- and lower-case roman characters. Where there
55+
* are multiple possible phonetic representations, these are joined with a pipe (<code>|</code>) character. If multiple
56+
* hyphenated words where found, or if the word may contain a name prefix, each encoded word is placed in elipses and
57+
* these blocks are then joined with hyphens. For example, "<code>d'ortley</code>" has a possible prefix. The form
58+
* without prefix encodes to "<code>ortlaj|ortlej</code>", while the form with prefix encodes to "
59+
* <code>dortlaj|dortlej</code>". Thus, the full, combined encoding is "<code>(ortlaj|ortlej)-(dortlaj|dortlej)</code>".
6060
* <p>
6161
* The encoded forms are often quite a bit longer than the input strings. This is because a single input may have many
62-
* potential phonetic interpretations. For example, "<code>Renault</code>" encodes to
63-
* "<code>rYnDlt|rYnalt|rYnult|rinDlt|rinalt|rinult</code>". The <code>APPROX</code> rules will tend to produce larger
62+
* potential phonetic interpretations. For example, "<code>Renault</code>" encodes to "
63+
* <code>rYnDlt|rYnalt|rYnult|rinDlt|rinalt|rinult</code>". The <code>APPROX</code> rules will tend to produce larger
6464
* encodings as they consider a wider range of possible, approximate phonetic interpretations of the original word.
6565
* Down-stream applications may wish to further process the encoding for indexing or lookup purposes, for example, by
6666
* splitting on pipe (<code>|</code>) and indexing under each of these alternatives.

0 commit comments

Comments
 (0)