Skip to content

Commit a38cf23

Browse files
committed
Javadoc
Close HTML tags
1 parent 6bd7fe4 commit a38cf23

File tree

9 files changed

+63
-14
lines changed

9 files changed

+63
-14
lines changed

src/main/java/org/apache/commons/codec/language/bm/BeiderMorseEncoder.java

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,13 @@
2525
* <p>
2626
* Beider-Morse phonetic encodings are optimised for family names. However, they may be useful for a wide range of
2727
* words.
28+
* </p>
2829
* <p>
2930
* This encoder is intentionally mutable to allow dynamic configuration through bean properties. As such, it is mutable,
3031
* and may not be thread-safe. If you require a guaranteed thread-safe encoding then use {@link PhoneticEngine}
3132
* directly.
32-
* <p>
33-
* <b>Encoding overview</b>
33+
* </p>
34+
* <h2>Encoding overview</h2>
3435
* <p>
3536
* Beider-Morse phonetic encodings is a multi-step process. Firstly, a table of rules is consulted to guess what
3637
* language the word comes from. For example, if it ends in "{@code ault}" then it infers that the word is French.
@@ -42,28 +43,31 @@
4243
* representation. Again, sometimes there are multiple ways this could be done and sometimes things that can be
4344
* pronounced in several ways in the source language have only one way to represent them in this average phonetic
4445
* language, so the result is again a set of phonetic spellings.
46+
* </p>
4547
* <p>
4648
* Some names are treated as having multiple parts. This can be due to two things. Firstly, they may be hyphenated. In
4749
* this case, each individual hyphenated word is encoded, and then these are combined end-to-end for the final encoding.
4850
* Secondly, some names have standard prefixes, for example, "{@code Mac/Mc}" in Scottish (English) names. As
4951
* sometimes it is ambiguous whether the prefix is intended or is an accident of the spelling, the word is encoded once
5052
* with the prefix and once without it. The resulting encoding contains one and then the other result.
51-
* <p>
52-
* <b>Encoding format</b>
53+
* </p>
54+
* <h2>Encoding format</h2>
5355
* <p>
5456
* Individual phonetic spellings of an input word are represented in upper- and lower-case roman characters. Where there
5557
* are multiple possible phonetic representations, these are joined with a pipe ({@code |}) character. If multiple
5658
* hyphenated words where found, or if the word may contain a name prefix, each encoded word is placed in ellipses and
5759
* these blocks are then joined with hyphens. For example, "{@code d'ortley}" has a possible prefix. The form
5860
* without prefix encodes to "{@code ortlaj|ortlej}", while the form with prefix encodes to "
5961
* {@code dortlaj|dortlej}". Thus, the full, combined encoding is "{@code (ortlaj|ortlej)-(dortlaj|dortlej)}".
62+
* </p>
6063
* <p>
6164
* The encoded forms are often quite a bit longer than the input strings. This is because a single input may have many
6265
* potential phonetic interpretations. For example, "{@code Renault}" encodes to "
6366
* {@code rYnDlt|rYnalt|rYnult|rinDlt|rinalt|rinult}". The {@code APPROX} rules will tend to produce larger
6467
* encodings as they consider a wider range of possible, approximate phonetic interpretations of the original word.
6568
* Down-stream applications may wish to further process the encoding for indexing or lookup purposes, for example, by
6669
* splitting on pipe ({@code |}) and indexing under each of these alternatives.
70+
* </p>
6771
* <p>
6872
* <b>Note</b>: this version of the Beider-Morse encoding is equivalent with v3.4 of the reference implementation.
6973
* </p>

src/main/java/org/apache/commons/codec/language/bm/Lang.java

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,18 +36,23 @@
3636
* <p>
3737
* This class encapsulates rules used to guess the possible languages that a word originates from. This is
3838
* done by reference to a whole series of rules distributed in resource files.
39+
* </p>
3940
* <p>
4041
* Instances of this class are typically managed through the static factory method instance().
4142
* Unless you are developing your own language guessing rules, you will not need to interact with this class directly.
43+
* </p>
4244
* <p>
4345
* This class is intended to be immutable and thread-safe.
44-
* <p>
45-
* <b>Lang resources</b>
46+
* </p>
47+
* <h2>Lang resources</h2>
4648
* <p>
4749
* Language guessing rules are typically loaded from resource files. These are UTF-8 encoded text files.
4850
* They are systematically named following the pattern:
51+
* </p>
4952
* <blockquote>org/apache/commons/codec/language/bm/lang.txt</blockquote>
53+
* <p>
5054
* The format of these resources is the following:
55+
* </p>
5156
* <ul>
5257
* <li><b>Rules:</b> whitespace separated strings.
5358
* There should be 3 columns to each row, and these will be interpreted as:
@@ -65,6 +70,7 @@
6570
* </ul>
6671
* <p>
6772
* Port of lang.php
73+
* </p>
6874
*
6975
* @since 1.6
7076
*/
@@ -119,6 +125,7 @@ public static Lang instance(final NameType nameType) {
119125
* <p>
120126
* In normal use, you will obtain instances of Lang through the {@link #instance(NameType)} method.
121127
* You will only need to call this yourself if you are developing custom language mapping rules.
128+
* </p>
122129
*
123130
* @param languageRulesResourceName
124131
* the fully-qualified resource name to load

src/main/java/org/apache/commons/codec/language/bm/Languages.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,12 @@
3333
* <p>
3434
* Language codes are typically loaded from resource files. These are UTF-8
3535
* encoded text files. They are systematically named following the pattern:
36+
* </p>
3637
* <blockquote>org/apache/commons/codec/language/bm/${{@link NameType#getName()}
3738
* languages.txt</blockquote>
3839
* <p>
3940
* The format of these resources is the following:
41+
* </p>
4042
* <ul>
4143
* <li><b>Language:</b> a single string containing no whitespace</li>
4244
* <li><b>End-of-line comments:</b> Any occurrence of '//' will cause all text
@@ -48,8 +50,10 @@
4850
* </ul>
4951
* <p>
5052
* Ported from language.php
53+
* </p>
5154
* <p>
5255
* This class is immutable and thread-safe.
56+
* </p>
5357
*
5458
* @since 1.6
5559
*/

src/main/java/org/apache/commons/codec/language/bm/NameType.java

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,19 @@
2626
*/
2727
public enum NameType {
2828

29-
/** Ashkenazi family names */
29+
/**
30+
* Ashkenazi family names.
31+
*/
3032
ASHKENAZI("ash"),
3133

32-
/** Generic names and words */
34+
/**
35+
* Generic names and words.
36+
*/
3337
GENERIC("gen"),
3438

35-
/** Sephardic family names */
39+
/**
40+
* Sephardic family names.
41+
*/
3642
SEPHARDIC("sep");
3743

3844
private final String name;

src/main/java/org/apache/commons/codec/language/bm/PhoneticEngine.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,15 @@
4141
* into account the likely source language. Next, this phonetic representation is converted into a
4242
* pan-European 'average' representation, allowing comparison between different versions of essentially
4343
* the same word from different languages.
44+
* </p>
4445
* <p>
4546
* This class is intentionally immutable and thread-safe.
4647
* If you wish to alter the settings for a PhoneticEngine, you
4748
* must make a new one with the updated settings.
49+
* </p>
4850
* <p>
4951
* Ported from phoneticengine.php
52+
* </p>
5053
*
5154
* @since 1.6
5255
*/
@@ -97,6 +100,7 @@ public void append(final CharSequence str) {
97100
* <p>
98101
* This will lengthen phonemes that have compatible language sets to the expression, and drop those that are
99102
* incompatible.
103+
* </p>
100104
*
101105
* @param phonemeExpr the expression to apply
102106
* @param maxPhonemes the maximum number of phonemes to build up
@@ -237,6 +241,7 @@ public boolean isFound() {
237241

238242
/**
239243
* Joins some strings with an internal separator.
244+
*
240245
* @param strings Strings to join
241246
* @param sep String to separate them with
242247
* @return a single String consisting of each element of {@code strings} interleaved by {@code sep}

src/main/java/org/apache/commons/codec/language/bm/Rule.java

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
* <p>
4040
* Rules have a pattern, left context, right context, output phoneme, set of languages for which they apply
4141
* and a logical flag indicating if all languages must be in play. A rule matches if:
42+
* </p>
4243
* <ul>
4344
* <li>the pattern matches at the current position</li>
4445
* <li>the string up until the beginning of the pattern matches the left context</li>
@@ -49,16 +50,19 @@
4950
* <p>
5051
* Rules are typically generated by parsing rules resources. In normal use, there will be no need for the user
5152
* to explicitly construct their own.
53+
* </p>
5254
* <p>
5355
* Rules are immutable and thread-safe.
54-
* <p>
55-
* <b>Rules resources</b>
56+
* </p>
57+
* <h2>Rules resources</h2>
5658
* <p>
5759
* Rules are typically loaded from resource files. These are UTF-8 encoded text files. They are systematically
5860
* named following the pattern:
61+
* </p>
5962
* <blockquote>org/apache/commons/codec/language/bm/${NameType#getName}_${RuleType#getName}_${language}.txt</blockquote>
6063
* <p>
6164
* The format of these resources is the following:
65+
* </p>
6266
* <ul>
6367
* <li><b>Rules:</b> whitespace separated, double-quoted strings. There should be 4 columns to each row, and these
6468
* will be interpreted as:

src/main/java/org/apache/commons/codec/language/bm/RuleType.java

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,19 @@
2424
*/
2525
public enum RuleType {
2626

27-
/** Approximate rules, which will lead to the largest number of phonetic interpretations. */
27+
/**
28+
* Approximate rules, which will lead to the largest number of phonetic interpretations.
29+
*/
2830
APPROX("approx"),
29-
/** Exact rules, which will lead to a minimum number of phonetic interpretations. */
31+
32+
/**
33+
* Exact rules, which will lead to a minimum number of phonetic interpretations.
34+
*/
3035
EXACT("exact"),
31-
/** For internal use only. Please use {@link #APPROX} or {@link #EXACT}. */
36+
37+
/**
38+
* For internal use only. Please use {@link #APPROX} or {@link #EXACT}.
39+
*/
3240
RULES("rules");
3341

3442
private final String name;

src/main/java/org/apache/commons/codec/net/QuotedPrintableCodec.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,7 @@ private static boolean isWhitespace(final int b) {
247247
* <p>
248248
* This function implements a subset of quoted-printable encoding specification (rule #1 and rule #2) as defined in
249249
* RFC 1521 and is suitable for encoding binary data and unformatted text.
250+
* </p>
250251
*
251252
* @param printable
252253
* bitset of characters deemed quoted-printable
@@ -264,6 +265,7 @@ public static final byte[] encodeQuotedPrintable(final BitSet printable, final b
264265
* Depending on the selection of the {@code strict} parameter, this function either implements the full ruleset
265266
* or only a subset of quoted-printable encoding specification (rule #1 and rule #2) as defined in
266267
* RFC 1521 and is suitable for encoding binary data and unformatted text.
268+
* </p>
267269
*
268270
* @param printable
269271
* bitset of characters deemed quoted-printable
@@ -347,6 +349,7 @@ public static final byte[] encodeQuotedPrintable(BitSet printable, final byte[]
347349
* <p>
348350
* This function fully implements the quoted-printable encoding specification (rule #1 through rule #5) as
349351
* defined in RFC 1521.
352+
* </p>
350353
*
351354
* @param bytes
352355
* array of quoted-printable characters
@@ -387,6 +390,7 @@ public static final byte[] decodeQuotedPrintable(final byte[] bytes) throws Deco
387390
* Depending on the selection of the {@code strict} parameter, this function either implements the full ruleset
388391
* or only a subset of quoted-printable encoding specification (rule #1 and rule #2) as defined in
389392
* RFC 1521 and is suitable for encoding binary data and unformatted text.
393+
* </p>
390394
*
391395
* @param bytes
392396
* array of bytes to be encoded
@@ -403,6 +407,7 @@ public byte[] encode(final byte[] bytes) {
403407
* <p>
404408
* This function fully implements the quoted-printable encoding specification (rule #1 through rule #5) as
405409
* defined in RFC 1521.
410+
* </p>
406411
*
407412
* @param bytes
408413
* array of quoted-printable characters
@@ -421,6 +426,7 @@ public byte[] decode(final byte[] bytes) throws DecoderException {
421426
* Depending on the selection of the {@code strict} parameter, this function either implements the full ruleset
422427
* or only a subset of quoted-printable encoding specification (rule #1 and rule #2) as defined in
423428
* RFC 1521 and is suitable for encoding binary data and unformatted text.
429+
* </p>
424430
*
425431
* @param sourceStr
426432
* string to convert to quoted-printable form
@@ -571,6 +577,7 @@ public String getDefaultCharset() {
571577
* Depending on the selection of the {@code strict} parameter, this function either implements the full ruleset
572578
* or only a subset of quoted-printable encoding specification (rule #1 and rule #2) as defined in
573579
* RFC 1521 and is suitable for encoding binary data and unformatted text.
580+
* </p>
574581
*
575582
* @param sourceStr
576583
* string to convert to quoted-printable form
@@ -592,6 +599,7 @@ public String encode(final String sourceStr, final Charset sourceCharset) {
592599
* Depending on the selection of the {@code strict} parameter, this function either implements the full ruleset
593600
* or only a subset of quoted-printable encoding specification (rule #1 and rule #2) as defined in
594601
* RFC 1521 and is suitable for encoding binary data and unformatted text.
602+
* </p>
595603
*
596604
* @param sourceStr
597605
* string to convert to quoted-printable form

src/main/java/org/apache/commons/codec/net/RFC1522Codec.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ abstract class RFC1522Codec {
5656
* <p>
5757
* This method constructs the "encoded-word" header common to all the RFC 1522 codecs and then invokes
5858
* {@link #doEncoding(byte[])} method of a concrete class to perform the specific encoding.
59+
* </p>
5960
*
6061
* @param text
6162
* a string to encode
@@ -86,6 +87,7 @@ protected String encodeText(final String text, final Charset charset) throws Enc
8687
* <p>
8788
* This method constructs the "encoded-word" header common to all the RFC 1522 codecs and then invokes
8889
* {@link #doEncoding(byte[])} method of a concrete class to perform the specific encoding.
90+
* </p>
8991
*
9092
* @param text
9193
* a string to encode
@@ -112,6 +114,7 @@ protected String encodeText(final String text, final String charsetName)
112114
* <p>
113115
* This method processes the "encoded-word" header common to all the RFC 1522 codecs and then invokes
114116
* {@link #doDecoding(byte[])} method of a concrete class to perform the specific decoding.
117+
* </p>
115118
*
116119
* @param text
117120
* a string to decode

0 commit comments

Comments
 (0)