@@ -16,29 +16,53 @@ See the License for the specific language governing permissions and
1616limitations under the License.
1717-->
1818<document >
19- <properties >
20- <title >User Guide</title >
21- <author email =" dev@commons.apache.org" >Commons Documentation Team</author >
22- </properties >
23- <body >
24- <!-- ================================================== -->
25- <section name =" Parsing an Excel CSV File" >
26- <p >To parse an Excel CSV file, write:</p >
27- <source >Reader in = new FileReader(" path/to/file.csv" );
19+ <properties >
20+ <title >User Guide</title >
21+ <author email =" dev@commons.apache.org" >Commons Documentation Team</author >
22+ </properties >
23+ <body >
24+ <!-- ================================================== -->
25+
26+ <h1 >Apache Commons CSV User Guide</h1 >
27+
28+ <macro name =" toc" >
29+ </macro >
30+
31+ <section name =" Parsing files" >
32+
33+ Parsing files with Apache Commons CSV is relatively straight forward.
34+ The CSVFormat class provides some commonly used CSV variants:
35+
36+ <dl >
37+ <dt >RFC-4180</dt ><dd >The format defined by <a href =" https://tools.ietf.org/html/rfc4180" >RFC-4180</a ></dd >
38+ <dt >MYSQL</dt ><dd >The format used by MySQL data bases</dd >
39+ <dt >TDF</dt ><dd >A tab delimited format</dd >
40+ <dt >EXCEL</dt ><dd >The format used by Excel</dd >
41+ </dl >
42+
43+ <subsection name =" Example: Parsing an Excel CSV File" >
44+ <p >To parse an Excel CSV file, write:</p >
45+ <source >Reader in = new FileReader(" path/to/file.csv" );
2846Iterable< CSVRecord> records = CSVFormat.EXCEL.parse(in);
2947for (CSVRecord record : records) {
3048 String lastName = record.get("Last Name");
3149 String firstName = record.get("First Name");
32- }</source >
33- </section >
34- <section name =" Handling Byte Order Marks" >
35- <p >
36- To handle files that start with a Byte Order Mark (BOM) like some Excel CSV files, you need an extra step to deal with these optional bytes.
37- You can use the
38- <a href =" https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html" >BOMInputStream</a >
39- class from <a href =" https://commons.apache.org/proper/commons-io/" >Apache Commons IO</a > for example:
40- </p >
41- <source >final URL url = ...;
50+ }
51+ </source >
52+ </subsection >
53+ <subsection name =" Handling Byte Order Marks" >
54+ <p >
55+ To handle files that start with a Byte Order Mark (BOM) like some Excel CSV files, you need an extra step to
56+ deal with these optional bytes.
57+ You can use the
58+ <a href =" https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html" >
59+ BOMInputStream
60+ </a >
61+ class from
62+ <a href =" https://commons.apache.org/proper/commons-io/" >Apache Commons IO</a >
63+ for example:
64+ </p >
65+ <source >final URL url = ...;
4266final Reader reader = new InputStreamReader(new BOMInputStream(url.openStream()), "UTF-8");
4367final CSVParser parser = new CSVParser(reader, CSVFormat.EXCEL.withHeader());
4468try {
@@ -49,29 +73,98 @@ try {
4973} finally {
5074 parser.close();
5175 reader.close();
52- }</source >
53- <p >
54- You might find it handy to create something like this:
55- </p >
56- <source >/**
57- * Creates a reader capable of handling BOMs.
58- */
76+ }
77+ </source >
78+ <p >
79+ You might find it handy to create something like this:
80+ </p >
81+ <source >/**
82+ * Creates a reader capable of handling BOMs.
83+ */
5984public InputStreamReader newReader(final InputStream inputStream) {
6085 return new InputStreamReader(new BOMInputStream(inputStream), StandardCharsets.UTF_8);
61- }</source >
62- </section >
63- <section name =" Printing with headers" >
64- <p >
65- To print a CSV file with headers, you specify the headers in the format:
66- </p >
67- <source >final Appendable out = ...;
68- final CSVPrinter printer = CSVFormat.DEFAULT.withHeader("H1", "H2").print(out)</source >
69- <p >
70- To print a CSV file with JDBC column labels, you specify the ResultSet in the format:
71- </p >
72- <source >final ResultSet resultSet = ...;
73- final CSVPrinter printer = CSVFormat.DEFAULT.withHeader(resultSet).print(out)</source >
74- </section >
75- <!-- ================================================== -->
76- </body >
86+ }
87+ </source >
88+ </subsection >
89+ </section >
90+
91+ <section name =" Working with headers" >
92+
93+ Apache Commons CSV provides several ways to access record values.
94+ The simplest way is to access values by their index in the record.
95+ However, columns in CSV files often have a name, for example: ID, CustomerNo, Birthday, etc.
96+ The CSVFormat class provides an API for specifing these <i >header</i > names and CSVRecord on
97+ the other hand has methods to access values by their corresponding header name.
98+
99+ <subsection name =" Accessing column values by index" >
100+ To access a record value by index, no special configuration of the CSVFormat is necessary:
101+ <source >Reader in = new FileReader(" path/to/file.csv" );
102+ Iterable< CSVRecord> records = CSVFormat.RFC4180.parse(in);
103+ for (CSVRecord record : records) {
104+ String columnOne = record.get(0);
105+ String columnTwo = record.get(1);
106+ }
107+ </source >
108+ </subsection >
109+ <subsection name =" Defining a header manually" >
110+ Indices may not be the most intuitive way to access record values. For this reason it is possible to
111+ assign names to each column in the file:
112+ <source >Reader in = new FileReader(" path/to/file.csv" );
113+ Iterable< CSVRecord> records = CSVFormat.RFC4180.withHeader("ID", "CustomerNo", "Name").parse(in);
114+ for (CSVRecord record : records) {
115+ String id = record.get("ID");
116+ String customerNo = record.get("CustomerNo");
117+ String name = record.get("Name");
118+ }
119+ </source >
120+ Note that column values can still be accessed using their index.
121+ </subsection >
122+ <subsection name =" Using an enum to define a header" >
123+ Using String values all over the code to reference columns can be error prone. For this reason,
124+ it is possible to define an enum to specify header names. Note that the enum constant names are
125+ used to access column values. This may lead to enums constant names which do not follow the Java
126+ coding standard of defining constants in upper case with underscores:
127+ <source >public enum Headers {
128+ ID, CustomerNo, Name
129+ }
130+ Reader in = new FileReader(" path/to/file.csv" );
131+ Iterable< CSVRecord> records = CSVFormat.RFC4180.withHeader(Headers.class).parse(in);
132+ for (CSVRecord record : records) {
133+ String id = record.get(Headers.ID);
134+ String customerNo = record.get(Headers.CustomerNo);
135+ String name = record.get(Headers.Name);
136+ }
137+ </source >
138+ Again it is possible to access values by their index and by using a String (for example "CustomerNo").
139+ </subsection >
140+ <subsection name =" Header auto detection" >
141+ Some CSV files define header names in their first record. If configured, Apache Commons CSV can parse
142+ the header names from the first record:
143+ <source >Reader in = new FileReader(" path/to/file.csv" );
144+ Iterable< CSVRecord> records = CSVFormat.RFC4180.withFirstRowAsHeader().parse(in);
145+ for (CSVRecord record : records) {
146+ String id = record.get("ID");
147+ String customerNo = record.get("CustomerNo");
148+ String name = record.get("Name");
149+ }
150+ </source >
151+ This will use the values from the first record as header names and skip the first record when iterating.
152+ </subsection >
153+ <subsection name =" Printing with headers" >
154+ <p >
155+ To print a CSV file with headers, you specify the headers in the format:
156+ </p >
157+ <source >final Appendable out = ...;
158+ final CSVPrinter printer = CSVFormat.DEFAULT.withHeader("H1", "H2").print(out)
159+ </source >
160+ <p >
161+ To print a CSV file with JDBC column labels, you specify the ResultSet in the format:
162+ </p >
163+ <source >final ResultSet resultSet = ...;
164+ final CSVPrinter printer = CSVFormat.DEFAULT.withHeader(resultSet).print(out)
165+ </source >
166+ </subsection >
167+ </section >
168+ <!-- ================================================== -->
169+ </body >
77170</document >
0 commit comments