Skip to content

Commit 357562b

Browse files
committed
New structure for the user guide using subsections. More documentation on how to define headers.
git-svn-id: https://svn.apache.org/repos/asf/commons/proper/csv/trunk@1742415 13f79535-47bb-0310-9956-ffa450edef68
1 parent 3a3b483 commit 357562b

1 file changed

Lines changed: 135 additions & 42 deletions

File tree

src/site/xdoc/user-guide.xml

Lines changed: 135 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -16,29 +16,53 @@ See the License for the specific language governing permissions and
1616
limitations under the License.
1717
-->
1818
<document>
19-
<properties>
20-
<title>User Guide</title>
21-
<author email="dev@commons.apache.org">Commons Documentation Team</author>
22-
</properties>
23-
<body>
24-
<!-- ================================================== -->
25-
<section name="Parsing an Excel CSV File">
26-
<p>To parse an Excel CSV file, write:</p>
27-
<source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
19+
<properties>
20+
<title>User Guide</title>
21+
<author email="dev@commons.apache.org">Commons Documentation Team</author>
22+
</properties>
23+
<body>
24+
<!-- ================================================== -->
25+
26+
<h1>Apache Commons CSV User Guide</h1>
27+
28+
<macro name="toc">
29+
</macro>
30+
31+
<section name="Parsing files">
32+
33+
Parsing files with Apache Commons CSV is relatively straight forward.
34+
The CSVFormat class provides some commonly used CSV variants:
35+
36+
<dl>
37+
<dt>RFC-4180</dt><dd>The format defined by <a href="https://tools.ietf.org/html/rfc4180">RFC-4180</a></dd>
38+
<dt>MYSQL</dt><dd>The format used by MySQL data bases</dd>
39+
<dt>TDF</dt><dd>A tab delimited format</dd>
40+
<dt>EXCEL</dt><dd>The format used by Excel</dd>
41+
</dl>
42+
43+
<subsection name="Example: Parsing an Excel CSV File">
44+
<p>To parse an Excel CSV file, write:</p>
45+
<source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
2846
Iterable&lt;CSVRecord&gt; records = CSVFormat.EXCEL.parse(in);
2947
for (CSVRecord record : records) {
3048
String lastName = record.get("Last Name");
3149
String firstName = record.get("First Name");
32-
}</source>
33-
</section>
34-
<section name="Handling Byte Order Marks">
35-
<p>
36-
To handle files that start with a Byte Order Mark (BOM) like some Excel CSV files, you need an extra step to deal with these optional bytes.
37-
You can use the
38-
<a href="https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html">BOMInputStream</a>
39-
class from <a href="https://commons.apache.org/proper/commons-io/">Apache Commons IO</a> for example:
40-
</p>
41-
<source>final URL url = ...;
50+
}
51+
</source>
52+
</subsection>
53+
<subsection name="Handling Byte Order Marks">
54+
<p>
55+
To handle files that start with a Byte Order Mark (BOM) like some Excel CSV files, you need an extra step to
56+
deal with these optional bytes.
57+
You can use the
58+
<a href="https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html">
59+
BOMInputStream
60+
</a>
61+
class from
62+
<a href="https://commons.apache.org/proper/commons-io/">Apache Commons IO</a>
63+
for example:
64+
</p>
65+
<source>final URL url = ...;
4266
final Reader reader = new InputStreamReader(new BOMInputStream(url.openStream()), "UTF-8");
4367
final CSVParser parser = new CSVParser(reader, CSVFormat.EXCEL.withHeader());
4468
try {
@@ -49,29 +73,98 @@ try {
4973
} finally {
5074
parser.close();
5175
reader.close();
52-
}</source>
53-
<p>
54-
You might find it handy to create something like this:
55-
</p>
56-
<source>/**
57-
* Creates a reader capable of handling BOMs.
58-
*/
76+
}
77+
</source>
78+
<p>
79+
You might find it handy to create something like this:
80+
</p>
81+
<source>/**
82+
* Creates a reader capable of handling BOMs.
83+
*/
5984
public InputStreamReader newReader(final InputStream inputStream) {
6085
return new InputStreamReader(new BOMInputStream(inputStream), StandardCharsets.UTF_8);
61-
}</source>
62-
</section>
63-
<section name="Printing with headers">
64-
<p>
65-
To print a CSV file with headers, you specify the headers in the format:
66-
</p>
67-
<source>final Appendable out = ...;
68-
final CSVPrinter printer = CSVFormat.DEFAULT.withHeader("H1", "H2").print(out)</source>
69-
<p>
70-
To print a CSV file with JDBC column labels, you specify the ResultSet in the format:
71-
</p>
72-
<source>final ResultSet resultSet = ...;
73-
final CSVPrinter printer = CSVFormat.DEFAULT.withHeader(resultSet).print(out)</source>
74-
</section>
75-
<!-- ================================================== -->
76-
</body>
86+
}
87+
</source>
88+
</subsection>
89+
</section>
90+
91+
<section name="Working with headers">
92+
93+
Apache Commons CSV provides several ways to access record values.
94+
The simplest way is to access values by their index in the record.
95+
However, columns in CSV files often have a name, for example: ID, CustomerNo, Birthday, etc.
96+
The CSVFormat class provides an API for specifing these <i>header</i> names and CSVRecord on
97+
the other hand has methods to access values by their corresponding header name.
98+
99+
<subsection name="Accessing column values by index">
100+
To access a record value by index, no special configuration of the CSVFormat is necessary:
101+
<source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
102+
Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.parse(in);
103+
for (CSVRecord record : records) {
104+
String columnOne = record.get(0);
105+
String columnTwo = record.get(1);
106+
}
107+
</source>
108+
</subsection>
109+
<subsection name="Defining a header manually">
110+
Indices may not be the most intuitive way to access record values. For this reason it is possible to
111+
assign names to each column in the file:
112+
<source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
113+
Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.withHeader("ID", "CustomerNo", "Name").parse(in);
114+
for (CSVRecord record : records) {
115+
String id = record.get("ID");
116+
String customerNo = record.get("CustomerNo");
117+
String name = record.get("Name");
118+
}
119+
</source>
120+
Note that column values can still be accessed using their index.
121+
</subsection>
122+
<subsection name="Using an enum to define a header">
123+
Using String values all over the code to reference columns can be error prone. For this reason,
124+
it is possible to define an enum to specify header names. Note that the enum constant names are
125+
used to access column values. This may lead to enums constant names which do not follow the Java
126+
coding standard of defining constants in upper case with underscores:
127+
<source>public enum Headers {
128+
ID, CustomerNo, Name
129+
}
130+
Reader in = new FileReader(&quot;path/to/file.csv&quot;);
131+
Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.withHeader(Headers.class).parse(in);
132+
for (CSVRecord record : records) {
133+
String id = record.get(Headers.ID);
134+
String customerNo = record.get(Headers.CustomerNo);
135+
String name = record.get(Headers.Name);
136+
}
137+
</source>
138+
Again it is possible to access values by their index and by using a String (for example "CustomerNo").
139+
</subsection>
140+
<subsection name="Header auto detection">
141+
Some CSV files define header names in their first record. If configured, Apache Commons CSV can parse
142+
the header names from the first record:
143+
<source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
144+
Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.withFirstRowAsHeader().parse(in);
145+
for (CSVRecord record : records) {
146+
String id = record.get("ID");
147+
String customerNo = record.get("CustomerNo");
148+
String name = record.get("Name");
149+
}
150+
</source>
151+
This will use the values from the first record as header names and skip the first record when iterating.
152+
</subsection>
153+
<subsection name="Printing with headers">
154+
<p>
155+
To print a CSV file with headers, you specify the headers in the format:
156+
</p>
157+
<source>final Appendable out = ...;
158+
final CSVPrinter printer = CSVFormat.DEFAULT.withHeader("H1", "H2").print(out)
159+
</source>
160+
<p>
161+
To print a CSV file with JDBC column labels, you specify the ResultSet in the format:
162+
</p>
163+
<source>final ResultSet resultSet = ...;
164+
final CSVPrinter printer = CSVFormat.DEFAULT.withHeader(resultSet).print(out)
165+
</source>
166+
</subsection>
167+
</section>
168+
<!-- ================================================== -->
169+
</body>
77170
</document>

0 commit comments

Comments
 (0)