Document

Quick Start

Data

Preporcessed Files

data.txt - Data file in a unified format
Q_table.npy - Ndarray of shape (n_question, n_concept), where each row is a one-hot or multi-hot vector, indicating the correspondence between questions and concepts
statics_preprocessed.json - Statistics of the dataset
[question|concept]_id_map.csv - The correspondence between the original question/concept ID and the mapped question/concept ID (mapped to integers, starting from 0), as well as the meta information of the question and concept

Format of `data.txt`

Example of Ednet-kt1 dataset

user_id,seq_len;question_seq,correctness_seq,time_seq,use_time_seq
0
76
0,1,2,3,4,5,6,8,7,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,28,27,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,46,32,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,64,30,48,65
1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,1,0,0,1,0,1,0,1,0,1,1,1,1,1,1,1,0,0,1,1,1,1,0,1,0,1,1,1,1,0,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0
1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515917,1515917,1515917,1515917,1515917,1515917,1515917,1515917,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517184,1517184,1517184,1517184,1517185,1517185,1517185,1517185,1517185,1517185,1517185,1517185,1517185,1517185
20,21,21,27,13,13,13,12,12,12,12,12,12,18,18,18,26,16,6,19,27,29,53,53,53,53,51,51,51,51,20,19,18,17,15,14,16,16,18,16,14,17,18,16,15,18,15,30,20,21,12,16,15,32,29,34,18,19,16,32,18,6,17,14,16,31,20,20,21,18,25,15,19,21,22,23
1
77
70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,33,130,131,132,37,133,134,135,49,136,137,138,63,139,140,141,45
0,0,0,0,1,0,0,1,1,0,1,1,0,1,1,1,0,1,0,1,1,1,0,0,1,1,0,0,0,1,0,0,1,1,0,1,1,1,1,1,0,0,1,0,0,1,1,1,0,0,1,1,0,0,0,1,0,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0
1562640,1562640,1562640,1562640,1562640,1562640,1562640,1564381,1564381,1564381,1564381,1564381,1564381,1564381,1564381,1564381,1564381,1565079,1565079,1565079,1565079,1565079,1565079,1565079,1565079,1565079,1565079,1565973,1565973,1565973,1565973,1565973,1565973,1565973,1565973,1565973,1565973,1566090,1566090,1566090,1566090,1566090,1566437,1566437,1566437,1566437,1566437,1568368,1568368,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568551,1568551,1568551,1568551,1568551,1568551,1568551,1568551,1568551,1568551
29,21,27,10,7,4,16,36,33,33,19,34,42,9,21,24,4,37,160,24,8,5,21,32,16,24,19,9,14,5,19,4,15,10,11,9,5,9,15,11,7,38,15,20,21,9,15,18,15,18,20,10,24,5,20,13,10,12,9,15,20,13,14,19,9,9,14,14,21,10,14,13,21,16,18,17,23
...

The first line is separated by a semicolon ;. On the left is the user information contained in the dataset, and on the right is the information related to each interaction between the user and the question contained in the dataset.

Feature

Concept Aggregation

The official code of some models can only run on a single-concept dataset (i.e., one question corresponds to only one concept), such as DKT. We designed an Embedding Layer that can automatically index the corresponding concept ids through the question id and return the aggregated (for example, average pooling) concept embedding corresponding to the question.

Mask Questions or Concepts

Use Q to represent the number of question in the dataset, and C to represent the number of concepts in the dataset.

Example 1: Use the question of the assist2009 dataset to train DKT.
1. Create a folder dataset_preprocessed/assist2009-no-concept and save a unit diagonal matrix of size Q*Q in the folder and name it Q_table.npy
2. When training DKT, set the parameter dataset_name to assist2009-no-concept
3. Our code will automatically read information related to questions and concepts from the Q table
Example 2: Only use the concept of the assist2009 dataset to train AKT.
1. Create a folder dataset_preprocessed/assist2009-no-question and save a unit diagonal matrix of size C*C in the folder and name it Q_table.npy
2. When training AKT, set the parameter dataset_name to assist2009-no-question
3. Our code will automatically read information related to questions and concepts from the Q table
Example 3: Run some models that can only process single-concept datasets (such as CLKT, DTransformer, HawkesKT, ABQR and HDLPKT).
1. Running script examples/knowledge_tracing/mc2sc.py will treat the combination of multiple concepts as a new concept and regenerate the Q_table.npy. For example, for the assist2009 dataset, the assist2009-single-concept dataset will be generated.
2. When training model, set the parameter dataset_name to assist2009-single-concept

Code Optimization

For GIKT and KG4EX, the inference process in the official code is implemented using loops. To accelerate the inference, this algorithm library rewrites the loop-based code into matrix operations, achieving a 10–100x speedup. Additionally, the official code of GIKT consumes a significant amount of GPU memory, which has also been optimized accordingly.