iCTF 2023 — escape_from_markov

We made a fake flag generator just for this CTF! It can generate flags that look like the real one! Can you find the real flag?

nc 0.cloud.chals.io 34879

We're given a Python server that looks like this:

1#!/usr/bin/env python3
2
3import io
4import random
5import string
6from collections import Counter, defaultdict
7
8with open('flag.txt', 'r') as f:
9    flag = f.read()
10
11def generate_fake_flag():
12    return 'ictf{' + ''.join([
13        random.choice(string.ascii_lowercase + string.digits + '_-') for _ in range(20)
14    ]) + '}'
15
16def derive_markov_model(texts):
17    probabilities = defaultdict(Counter)
18    for text in texts:
19        for a, b in zip(text[:-1], text[1:]):
20            probabilities[a][b] += 1
21
22    return probabilities
23
24def predict_next_char(model, prefix):
25    if not prefix:
26        prefix = 'ictf{'
27
28    last_char = prefix[-1]
29    if last_char not in model:
30        return random.choice(string.ascii_lowercase + '_')
31    else:
32        options = model[last_char]
33        options_str = ''.join(c * cnt for c, cnt in options.items())
34        return random.choice(options_str)
35
36def finish_flag(model, prefix):
37    flag = prefix
38    while flag[-1] != '}' and len(flag) < 30:
39        flag += predict_next_char(model, flag)
40    if flag[-1] != '}':
41        flag += '}'
42    return flag
43
44def main():
45    num_datapoints = int(input("How many training samples would you like?\n"))
46    percent_real = int(input("What percentage of training flags would you like to be included to make the flags look real? (max 20%)\n"))
47    assert 0 <= percent_real <= 20
48
49    num_times_real = int(num_datapoints * (percent_real / 100))
50    num_times_fake = num_datapoints - num_times_real
51
52    dataset = [flag] * num_times_real + [generate_fake_flag() for _ in range(num_times_fake)]
53
54    print("Understood, training the model...")
55    # import ipdb; ipdb.set_trace()
56    model = derive_markov_model(dataset)
57
58    print("Done! Now, how many flags would you like to generate?")
59    num_flags = int(input())
60    if num_flags > 10000:
61        print("Sorry, that's too many flags.")
62        return
63    print("Here you go:")
64    for _ in range(num_flags):
65        print(finish_flag(model, 'ictf{'))
66
67    print("Thanks for using our service! Now, if you were by some incredible chance able to find the flag, you have one chance to confirm that it's correct.")
68    if input("Flag: ") == flag:
69        print("Correct!")
70    else:
71        print("Incorrect!")
72
73if __name__ == '__main__':
74    main()

Based on the challenge title and the provided source code, it looks like the flag is loaded into a Markov chain that we get to query generated flags from.

We can connect to the server and sample 10,000 flags from the Markov chain output, then analyze weights for each character in the chain:

1import collections
2
3markov = {}
4
5with open("out.txt") as f:
6    lines = [x.strip()[5:] for x in f.readlines()]
7
8print(lines)
9for line in lines:
10    for i in range(len(line) - 1):
11        if not (line[i] in markov):
12            markov[line[i]] = collections.Counter()
13        markov[line[i]][line[i + 1]] += 1
14
15for x, v in markov.items():
16    print(f"{x}: , {v.most_common(5)}")

(assuming generated flags are saved in out.txt)

Starting from i (because of ictf) and following the most probable next letter, we get

Code

1ictf{ma

Once we reach a, however, there seem to be 3 options with roughly similar probability. Before resolving this, we can work backwards from } to get

Code

1ictf{ma ... aR3s}

Starting from n, the sequence is

Code

1n_N1

with g or a potentially following. Starting from g, the sequence is

Code

1gh7ma

So putting the 3 sequences together to resemble a roughly english word, we can infer the end of the flag looks like

Code

1ictf{ma ... n_N1gh7maR3s}

Then, because R cannot follow ma (as that would make the flag ictf{maR3s}, ending it early and rendering it nonsensical) and because n likely does not immediately follow ma (as that would make the flag ictf{man_N1gh7maR3s}, which is a bit suspicious), the next character must then be r, making the sequence

Code

1ictf{mark0v1 ... n_N1gh7maR3s}

Slotting an a into the final slot completes the flag as

Code

1ictf{mark0v1an_N1gh7maR3s}