CSV-290 - Fix the wrong assumptions in PostgreSQL formats#265
Conversation
CSVFormat.POSTGRESQL_CSV - special characters are not escaped. CSVFormat.POSTGRESQL_TEXT - values are not quoted.
Codecov Report
@@ Coverage Diff @@
## master #265 +/- ##
=========================================
Coverage 97.34% 97.34%
Complexity 535 535
=========================================
Files 11 11
Lines 1169 1169
Branches 205 205
=========================================
Hits 1138 1138
Misses 18 18
Partials 13 13
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
Hi @angusdev |
|
I can't test previous versions of PostgreSQL. But it is pretty safe to say that it applies to all versions. Postgresql support export to CVS since version 8.0 (year 2005, https://www.postgresql.org/docs/8.0/release-8-0.html) For text format (tab delimited), there is no reason to quote the text. |
|
Tested the behaviour of import and export are consistent. Test case: export csv/tsv from PostgreSQL, read by commons-cvs and write to new csv/tsv, import to PostgreSQL, export csv/tsv again. compare the 1st and 2nd export file drop table COMMONS_CSV_PSQL_TEST;
create table COMMONS_CSV_PSQL_TEST (ID INTEGER, COL1 VARCHAR, COL2 VARCHAR, COL3 VARCHAR, COL4 VARCHAR);
insert into COMMONS_CSV_PSQL_TEST select 1, 'abc', 'test line 1' || chr(10) || 'test line 2', null, '';
insert into COMMONS_CSV_PSQL_TEST select 2, 'xyz', '\b:' || chr(8) || ' \n:' || chr(10) || ' \r:' || chr(13), 'a', 'b';
insert into COMMONS_CSV_PSQL_TEST values (3, 'a', 'b,c,d', '"quoted"', 'e');
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.csv' with (FORMAT CSV);
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.tsv';use commons-csv to read '/tmp/psql.csv' and write to '/tmp/outpsql.csv', same for 'psql.tsv' truncate table COMMONS_CSV_PSQL_TEST;
copy COMMONS_CSV_PSQL_TEST(ID, COL1, COL2, COL3, COL4) from '/tmp/outpsql.csv' with (FORMAT CSV);
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql2.csv' with (FORMAT CSV);
truncate table COMMONS_CSV_PSQL_TEST;
copy COMMONS_CSV_PSQL_TEST(ID, COL1, COL2, COL3, COL4) from '/tmp/outpsql.tsv';
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql2.tsv';diff /tmp/psql.csv /tmp/psql2.csv diff /tmp/psql.tsv /tmp/psql2.tsv |
|
Hello @angusdev Thank you for updating your PR. (1) I think you need to test for tab characters (ASCII 9) in values. (2) In the PG docs I read Please help me understand why the git master code does not match this definition. |
|
Added test for tab characters (ASCII 9) in values. For QUOTE and ESCAPE, see below example In PG (CSV), ESCAPE is used to escape the quote char, while in COMMONS_CSV, ESCAPE is to escape delimiter and special char In PG (TEXT), QUOTE is not needed as it is tab-delimited and the delimiter (tab) is escaped by '\t' |
I tested in psql 14.5 Homebrew in Mac M1.
CSVFormat.POSTGRESQL_CSV - special characters are not escaped.
CSVFormat.POSTGRESQL_TEXT - values are not quoted.