Raw Cell Value

Quick question,


I'm trying to read a CSV file that contains data like so:

ID,Value
1,Jan 10
2,Feb 10
3,Mar 10

When I read this, FlexCel behaves in a similar way to Excel. So reading "Jan 10" yields:

GetCellValue() => 40179
GetStringFromCell() => 01/01/2010 00:00:00

My question is, is it possible for me to get the raw value from the data? Note that I probably want to do the same through excel files in the future.

Cheers

Hi,

In FlexCel the raw cell value is returned by "GetCellValue". (get stringfromcell will return the formatted value).
But in this case, what happens is that during the import, FlexCel tries to be "smart" and converts a string like "jan 10" into  the date "40179", formatting the cell as date also. The cell formatting for dates is always one of the internal formats in the machine, long or short date fomat: That's why you see "01/01/2010" instead of "Jan 10" in the cell.

So in this case, if you want to retrieve the exact text that was in the csv, there are a couple of possibilities:
1) The simplest one: When you open the csv file with XlsFile.Open() or XlsFile.Import(), there is a parameter:
If you pass an array long enough to hold all the columns you want to import, and the elements in the array are "ColumnImportType.Text", then those columns will not be converted to dates, the cell will be imported as a string "jan 10", and you will be able to see this in both GetCellValue and GetStringFromCell.

If you know the structure you are importing, then you might set some members of the array to text and others to normal, for example in your case, column 1 normal (so "1" is converted to a number and not stored as a string "1") and column 2 Text.

2) If you want to just import the dates as text, there is another parameter "dateFormats" where you can specify exactly which dates are converted to dates, and the rest will be kept as text. For example if you pass the array with the string "dd/mm/yyyy" to this parameter, it will only convert dates in that format, and others like "jan 10" will be ignored and imported as text.

Regards,
   Adrian.

Adrian,


Unfortunately the sample data was the simplest case to reproduce the problem, and not the actual use case. Our data will actually be of the form:

ID, Jan 10, Feb 10, Mar 10 ... n
 A,     1,            2,           3, ... n
 B      4,            5,           6, ...n
 C      7,            8,           9, ...n

So before I open the file, I don't know the number of columns, and additionally it's only actually my header row which is text. Therefore I don't think option #1 will work.

Option #2 probably won't work either unfortunately, a slightly more complicated case than above:

ID, Jan 10, Feb 10, Mar 10 ... n
 A,  1/7/10, 2/8/10, 17/9/2010, ...n
 B   1/8/10, 3/9/10, 19/6/11, ...n
 C   1/9/10, 4/12/10, 12-Jan-2011, ...n

Unfortunately in this case, I want to treat headers as text again, but all the other content should be treated as is. I also can't guarantee that the format of the header will be (mmm yy).
Hi,
About option 1), a remark:
If you don't know how many columns, but you know a maximum of columns, then you can still use it. Just use an array with the maximum number of columns and all extra columns will be ignored. It used to be simpler in xls time when we designed this and the maximum number of columns was 256, so you could just enter an array of 256 values. Now with xlsx you should fill an array with 16000 values, which is wasteful, but would work. Or you can use an array with 256 if you know that no file will have more than that.(and they shouldn't...). We should add an option for the default of the columns anyway, now that you can have more than 256 cols.

Now, back to the problem, I could suggest using "Virtual Mode" but that's a new feature in 5.5, so you would have to request it by mail to me if you need it right now. If you open the csv in virtual mode, you will get an even called for every cell you are importing, so you will get the text, the value, the column and the row of the cell, and you can decide in a cell by cell basis.

A better solution would be for us to add an event where you can format what you import, but I will be sincere, we are *very* very late in launching 5.5, and if we keep adding features we will never ship it. If it is very trivial to add this event we will do it, if not it will have to wait until after 5.5 is released. I will let you know.

Regards,
    Adrian.

Adrian,


Would I be correct in assuming then that the Cell doesn't have it's raw value anymore? I'm wondering whether or not it would be possible to add a 'GetRawValue()' method or similar, which just bypasses the logic of applying formats and attempting to emulate Excel functionality?

No, the "raw" value isn't there once you import the file, it wouldn't make sense.  FlexCel offers two ways to import the data:

1) Import the raw or text values into the cells. As said, you can do this with ColumnFormat parameter when opening the csv. When you do this, all cells will have text, so you will have the string "1" instead of the number 1. And say in cell A1 you will get  the string "Feb 10".

2) Try to convert the strings to something more useful. This is the default, and in this case, the string "1" will be converted to the number 1. The string "Feb 10" will be converted to a date (which in Excel is a number, for example 40179). When converting the date strings to Excel dates (numbers) the cells are also formatted so they are dislayed like dates. The problem is that we need to guess from the string "Feb 10" that we should format as "mmmm dd". doing this for a generic case (converting a string with a date into a "yyy-ddd-mmm" format is next to impossible, so we (and Excel too) use a set of predefined formats. so feb 10 will be imported as the number "40179", and the cell formatted as "dd/mm/yyyy hh:mm:ss". Of course it would be nicer if we applied a "mmm dd" format, but as said, this is very difficult.

In any case, when we convert, the cell now has a raw value of "40179", and the formatted string is "dd/mm/yyyy hh:mm:ss". If you change the format of this cell to be "mmm dd" if will show as "feb 10". But the original string "Feb 10" is not anywhere anymore.