Data configuration¶
missing_values
¶
missing_values (List)
Default value ['', '?', 'None', 'nan', 'NA', 'N/A', 'unknown', 'inf', '-inf', '1.7976931348623157e+308', '-1.7976931348623157e+308']
The list of values that should be interpreted as missing values during data import. This applies to both numeric and string columns. Note that the dataset must be reloaded after applying changes to this config via the expert settings. Also note that ‘nan’ is always interpreted as a missing value for numeric columns.
max_rows_col_stats
¶
max_rows_col_stats (Number)
Default value 1000000
Largest number of rows to use for column stats, otherwise sample randomly
max_cols_gui_headtail
¶
max_cols_gui_headtail (Number)
Default value 1000
Maximum number of columns in each head and tail to show in GUI, useful when head or tail has all necessary columns, but too many for UI or web server to handle. -1 means no limit. A reasonable value is 500, after which web server or browser can become overloaded and use too much memory. Some values of column counts in UI may not show up correctly, and some dataset details functions may not work. To select (from GUI or client) any columns as being target, weight column, fold column, time column, time column groups, or dropped columns, the dataset should have those columns within the selected head or tail set of columns.