Document
Quick Start
Data
Preporcessed Files
data.txt
- Data file in a unified formatQ_table.npy
- Ndarray of shape (n_question, n_concept), where each row is a one-hot or multi-hot vector, indicating the correspondence between questions and conceptsstatics_preprocessed.json
- Statistics of the dataset[question|concept]_id_map.csv
- The correspondence between the original question/concept ID and the mapped question/concept ID (mapped to integers, starting from 0), as well as the meta information of the question and concept
Format of data.txt
Example of Ednet-kt1 dataset
user_id,seq_len;question_seq,correctness_seq,time_seq,use_time_seq
0
76
0,1,2,3,4,5,6,8,7,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,28,27,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,46,32,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,64,30,48,65
1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,1,0,0,1,0,1,0,1,0,1,1,1,1,1,1,1,0,0,1,1,1,1,0,1,0,1,1,1,1,0,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0
1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515916,1515917,1515917,1515917,1515917,1515917,1515917,1515917,1515917,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517126,1517184,1517184,1517184,1517184,1517185,1517185,1517185,1517185,1517185,1517185,1517185,1517185,1517185,1517185
20,21,21,27,13,13,13,12,12,12,12,12,12,18,18,18,26,16,6,19,27,29,53,53,53,53,51,51,51,51,20,19,18,17,15,14,16,16,18,16,14,17,18,16,15,18,15,30,20,21,12,16,15,32,29,34,18,19,16,32,18,6,17,14,16,31,20,20,21,18,25,15,19,21,22,23
1
77
70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,33,130,131,132,37,133,134,135,49,136,137,138,63,139,140,141,45
0,0,0,0,1,0,0,1,1,0,1,1,0,1,1,1,0,1,0,1,1,1,0,0,1,1,0,0,0,1,0,0,1,1,0,1,1,1,1,1,0,0,1,0,0,1,1,1,0,0,1,1,0,0,0,1,0,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0
1562640,1562640,1562640,1562640,1562640,1562640,1562640,1564381,1564381,1564381,1564381,1564381,1564381,1564381,1564381,1564381,1564381,1565079,1565079,1565079,1565079,1565079,1565079,1565079,1565079,1565079,1565079,1565973,1565973,1565973,1565973,1565973,1565973,1565973,1565973,1565973,1565973,1566090,1566090,1566090,1566090,1566090,1566437,1566437,1566437,1566437,1566437,1568368,1568368,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568550,1568551,1568551,1568551,1568551,1568551,1568551,1568551,1568551,1568551,1568551
29,21,27,10,7,4,16,36,33,33,19,34,42,9,21,24,4,37,160,24,8,5,21,32,16,24,19,9,14,5,19,4,15,10,11,9,5,9,15,11,7,38,15,20,21,9,15,18,15,18,20,10,24,5,20,13,10,12,9,15,20,13,14,19,9,9,14,14,21,10,14,13,21,16,18,17,23
...
The first line is separated by a semicolon ;
. On the left is the user information contained in the dataset, and on the right is the information related to each interaction between the user and the question contained in the dataset.
Feature
Concept Aggregation
The official code of some models can only run on a single-concept dataset (i.e., one question corresponds to only one concept), such as DKT. We designed an Embedding Layer that can automatically index the corresponding concept ids through the question id and return the aggregated (for example, average pooling) concept embedding corresponding to the question.
Mask Questions or Concepts
Use Q
to represent the number of question in the dataset, and C
to represent the number of concepts in the dataset.
-
Example 1: Use the question of the assist2009 dataset to train DKT.
- Create a folder
dataset_preprocessed/assist2009-no-concept
and save a unit diagonal matrix of sizeQ*Q
in the folder and name itQ_table.npy
- When training DKT, set the parameter
dataset_name
toassist2009-no-concept
- Our code will automatically read information related to questions and concepts from the Q table
- Create a folder
-
Example 2: Only use the concept of the assist2009 dataset to train AKT.
- Create a folder
dataset_preprocessed/assist2009-no-question
and save a unit diagonal matrix of sizeC*C
in the folder and name itQ_table.npy
- When training AKT, set the parameter
dataset_name
toassist2009-no-question
- Our code will automatically read information related to questions and concepts from the Q table
- Create a folder
-
Example 3: Run some models that can only process single-concept datasets (such as CLKT, DTransformer, HawkesKT, ABQR and HDLPKT).
- Running script
examples/knowledge_tracing/mc2sc.py
will treat the combination of multiple concepts as a new concept and regenerate theQ_table.npy
. For example, for theassist2009
dataset, theassist2009-single-concept
dataset will be generated. - When training model, set the parameter
dataset_name
toassist2009-single-concept
- Running script
Code Optimization
For GIKT and KG4EX, the inference process in the official code is implemented using loops. To accelerate the inference, this algorithm library rewrites the loop-based code into matrix operations, achieving a 10–100x speedup. Additionally, the official code of GIKT consumes a significant amount of GPU memory, which has also been optimized accordingly.