System Configuration#
Several aspects of Label Sleuth can be configured through the system’s configuration file.
Configuration file#
The default configuration file is located at label_sleuth/config.json.
A custom configuration file can be applied by passing the --config_path
parameter to the “start_label_sleuth” command. In that case the following command can be used to invoke Label Sleuth:
python -m label_sleuth.start_label_sleuth --config_path <path_to_my_configuration_json>
Alternatively, it is possible to override specific configuration parameters at startup by appending them to the “start_label_sleuth” command. For example, to set up the system to work with text data in Arabic, one can set the system language by using the following command:
python -m label_sleuth.start_label_sleuth --language Arabic
Parameters#
The following parameters can be set in the configuration file:
Parameter |
Description |
---|---|
|
Number of elements that must be assigned a positive label for the category in order to trigger the training of a classification model. |
|
Number of elements that must be assigned a negative label for the category in order to trigger the training of a classification model. |
|
Number of changes in user labels for the category – relative to the last trained model – that are required to trigger the training of a new model. A change can be a assigning a label (positive or negative) to an element, or changing an existing label. Note that both, |
|
Strategy to be used from TrainingSetSelectionStrategy. A TrainingSetSelectionStrategy determines which examples will be sent to the classification models at training time - these will not necessarily be identical to the set of elements labeled by the user. For currently supported implementations see the get_training_set_selector() function. |
|
Policy to be used from ModelPolicies. A ModelPolicy determines which type of classification model(s) will be used, and when (e.g. always / only after a specific number of iterations / etc.). |
|
Strategy to be used from ActiveLearningCatalog. An ActiveLearner module implements the strategy for recommending the next elements to be labeled by the user, aiming to increase the efficiency of the annotation process. |
|
Sample size to be used for estimating the precision of the current model when the precision evaluation function is invoked. |
|
Specifies how to treat elements with identical texts. If |
|
Specifies the chosen system-wide language. This determines some language-specific resources that will be used by models and helper functions (e.g., stop words). The list of supported languages can be found here. We welcome contributions of additional languages. |
|
Specifies whether or not using the system will require user authentication. If |
|
Only relevant if ”users”:[* The list of usernames is static and currently all users have access to all the workspaces in the system. |
|
Number of elements per page in the main panel, i.e., document view. |
|
Number of elements per page in the sidebar panels that use pagination. |
|
Max number of tokens after which a text snippet shown on the right panels is cut off. |
|
If |
|
The number of rows limit of the csv files. |
|
The max number of chars of document names. |
|
The max number of CPUs to use for running jobs. |