Usage¶
Getting started¶
With the package installation, a CLI tool is available: dsx.
Run the following to inspect command line arguments:
$ dsx --help
By default, it is assumed that the database is available at ~/.database18xx.
The default path can be adjusted by exporting the environment variable
DATABASE:
$ export DATABASE=<path/to/the/database>
Note
The environment variable is only valid for the current shell and all
processes started from it. To set is permanently for all future sessions
add the export to your .bashrc file (see this
gist)
Within the database, datasets for each 18xx game variant exist.
Downloading the database¶
The script can be used to download and extract the database from records18xx to the local disk.
Running the following command will install the database either the default database path or the exported one:
$ dsx download-db
Thanks to the extraction principle, downloading the database again will not override existing files but only updates changes and adds new files. Hence, the database can be updated without losing any processed data.
Generating a dataset¶
Upon the first download of the database, the transcripts are not processed yet.
To invoke the parsing of these, run the following command with the 18xx game variant specified, e.g., 1830:
$ dsx make --game G1830
This will created the parsed transcripts as well their metadata. Additionally, the metadata over the whole dataset will be created as well as a context depicting key elements of the transcripts.
Output artifacts¶
The metadata file will be saved in the dataset root, named metadata.json.
It depicts the following data:
Field |
Description |
|---|---|
|
Total number of transcripts in dataset |
|
Total number of valid transcripts in dataset |
|
Number of transcripts mapped to number of players |
|
Number of transcripts mapped to game endings |
|
Unprocessed lines, paths to transcripts that failed / could not be parsed |
Note
Valid means that the transcript could be parsed and the final game state verification was successful.
The context is saved in the dataset root as well, named context.csv.
It is primarily used to filter the dataset based on key elements, such as
number of players or game endings.
Each row depicts the context of one transcript
(see TranscriptContext).
Re-generating a dataset¶
Running the above command again, will only parse transcripts again that
either failed or could not be parsed during the last generation (see debug
in the metadata).
Since the parser is under active development, updating the database with
the latest parser is required.
The full dataset of a given 18xx game variant can be generated using the flag
--force:
$ dsx make --game G1830 --force
Inspecting a dataset¶
Datasets can be inspected, i.e., the following command will print the contents of the metadata to the console:
$ dsx inspect --game G1830
Inspecting a transcript¶
Similar to the inspection of the dataset, individual processed transcripts can be inspected.
The following command will load the transcript data by its game id and print its context as well as the head of the parsed result to the console:
$ dsx load --game G1830 --game_id 201210
Creating subsets¶
The default dataset contains all available transcripts for the specific game variant, irrespective of validity, number of players or game endings.
To create more meaningful datasets, a subset from the default dataset can be created. In this subset the number of players as well as the game endings can be selected.
The example below will create a subset of the default 1830 dataset, including only 4 players and games that ended in bankruptcy or with the bank broken:
$ dsx subset -g G1830 -n 4 -e BankBroke -e PlayerGoesBankrupt
Running the above command will select all transcripts matching the number of
players as well as either of the two game endings.
The new dataset will be saved in the database, named
1830_4p_BankBroke_PlayerGoesBankrupt.
Note
Re-generating the default dataset does not automatically update the subset. The subset is seen as a new dataset. However, creating a subset from a subset is not possible.
Note
The above mentioned functionality for generation and inspection can be run on the subset as well by defining the number of players and game endings from the commandline as depicted above in the example.