Reka Flash 3 vs QWQ 32b - A Battle Of Open Source

As an AI enthusiast in, I've been closely following the recent developments in large language models. Today, I'm diving into a detailed comparison between two powerful contenders: Reka Flash 3 and QwQ 32B. Both models have been making waves in the AI community, but how do they actually perform across various domains? Let's uncover the hype. Coding The first domain we will test both models is coding. Current LLM’s are proficient are writing clean and optimized code, let’s see how both the model performs on 2 different tasks. 3D Simulation - Java Script 3D simulation requires accurate particle physics calculations, proper collision detection logic & performance optimizations for large particle counts. Most proprietary LLM’s now a days do pretty well but lacks clean and concise code, let’s see how our contender performs. For this simulation I am going to create a basic JavaScript simulation of rotating 3d sphere with numbers which becomes brighter as they are close. Prompt: 3D Simulation using Java Script Create a JavaScript simulation of a rotating 3D sphere made up of letters. The closest letters should be brighter, while the ones farthest away should appear grey. Output: Reka Flash 3 Code link: 3d_rotating_letters_reka.html How To Use: Save the code as an HTML file Open it in a modern web browser Output: Reka Flash 3 Completely different from what was expected, no dark background, no functionality & very long generation time. Despite multiple efforts, the model failed to generate the right code for right output and kept reiterating the previous mistakes. I guess this was the result of overthinking. However, when I tried same prompt with number it generates a good output with all functionalities. Reka 3D Generation Test 2 : 3d_rotating_numbers_reka.html Output: QwQ 32b Code Link: JS (3d_rotating_letters_qwq.js), HTML (3d_rotating_letters_reka.html) Browser Output (3d letter rotating simulation) Though model generated bad code in 1st test but 2nd one gave an output. The output is modularized, polished, rendered perfectly and follows all the instructions in the prompt except speed controls. Definitely a go for small 3d scene rendering task. The only thing I hated was, it didn’t provide the run instructions, unless until stated explicitly. Let’s check the performance on Leet Code. Leet Code - Problem #3463 I found this one while writing my other blog, and it became my benchmark question to evaluate coding capabilities of LLM, as this requires 1st principle thinking, rather than library usage. Let’s see how Reka flash 3 vs. QwQ 32b performs: Note: This problem has only 8.8% acceptance rate on Leet Code and only Grok3 was able to solve, it. In Essense it's a really hard problem. Prompt: 3463. Check If Digits Are Equal in String After Operations II I am giving you a leetcode problem with test case, your job is to write a clean and concise code. The question will have private test cases, so make sure your cover edge cases as well. Here is the problem Statement : You are given a string s consisting of digits. Perform the following operation repeatedly until the string has exactly two digits: For each pair of consecutive digits in s, starting from the first digit, calculate a new digit as the sum of the two digits modulo 10. Replace s with the sequence of newly calculated digits, maintaining the order in which they are computed. Return true if the final two digits in s are the same; otherwise, return false. Example 1: Input: s = "3902" Output: true Explanation: Initially, s = "3902" First operation: (s[0] + s[1]) % 10 = (3 + 9) % 10 = 2 (s[1] + s[2]) % 10 = (9 + 0) % 10 = 9 (s[2] + s[3]) % 10 = (0 + 2) % 10 = 2 s becomes "292" Second operation: (s[0] + s[1]) % 10 = (2 + 9) % 10 = 1 (s[1] + s[2]) % 10 = (9 + 2) % 10 = 1 s becomes "11" Since the digits in "11" are the same, the output is true. Example 2: Input: s = "34789" Output: false Explanation: Initially, s = "34789". After the first operation, s = "7157". After the second operation, s = "862". After the third operation, s = "48". Since '4' != '8', the output is false. You can try solving this qn using 1st principle reasonig Output: Reka Flash 3 class Solution: def hasSameDigits(self, s: str) -> bool: while len(s) > 2: new_s = [] for i in range(len(s) - 1): new_digit = (int(s[i]) + int(s[i+1])) % 10 new_s.append(str(new_digit)) s = ''.join(new_s) return s[0] == s[1] Results (Time Limit Exceed Error) Time Limit Exceeded The code passed all public test cases but failed to solve private test cases in record time. Output: QWQ 32b class Solution: def hasSameDigits(self, s: str) -> bool: digits = list(map(int, s)) while len(digits) > 2: new_digits = [] for i in range(len(digits) - 1): new_digit = (digits[i]

Apr 21, 2025 - 12:08

Reka Flash 3 vs QWQ 32b - A Battle Of Open Source

As an AI enthusiast in, I've been closely following the recent developments in large language models. Today, I'm diving into a detailed comparison between two powerful contenders: Reka Flash 3 and QwQ 32B.

Both models have been making waves in the AI community, but how do they actually perform across various domains? Let's uncover the hype.

Coding

The first domain we will test both models is coding. Current LLM’s are proficient are writing clean and optimized code, let’s see how both the model performs on 2 different tasks.

3D Simulation - Java Script

3D simulation requires accurate particle physics calculations, proper collision detection logic & performance optimizations for large particle counts. Most proprietary LLM’s now a days do pretty well but lacks clean and concise code, let’s see how our contender performs.

For this simulation I am going to create a basic JavaScript simulation of rotating 3d sphere with numbers which becomes brighter as they are close.

Prompt: 3D Simulation using Java Script

Create a JavaScript simulation of a rotating 3D sphere made up of letters. 
The closest letters should be brighter, while the ones farthest away should appear grey.

Output: Reka Flash 3

Code link: 3d_rotating_letters_reka.html

How To Use:

Save the code as an HTML file
Open it in a modern web browser

Output: Reka Flash 3

Completely different from what was expected, no dark background, no functionality & very long generation time.

Despite multiple efforts, the model failed to generate the right code for right output and kept reiterating the previous mistakes. I guess this was the result of overthinking.

However, when I tried same prompt with number it generates a good output with all functionalities.

Reka 3D Generation Test 2 : 3d_rotating_numbers_reka.html

Output: QwQ 32b

Code Link: JS (3d_rotating_letters_qwq.js), HTML (3d_rotating_letters_reka.html)

Browser Output (3d letter rotating simulation)

Though model generated bad code in 1st test but 2nd one gave an output. The output is modularized, polished, rendered perfectly and follows all the instructions in the prompt except speed controls. Definitely a go for small 3d scene rendering task.

The only thing I hated was, it didn’t provide the run instructions, unless until stated explicitly.

Let’s check the performance on Leet Code.

Leet Code - Problem #3463

I found this one while writing my other blog, and it became my benchmark question to evaluate coding capabilities of LLM, as this requires 1st principle thinking, rather than library usage. Let’s see how Reka flash 3 vs. QwQ 32b performs:

Note: This problem has only 8.8% acceptance rate on Leet Code and only Grok3 was able to solve, it. In Essense it's a really hard problem.

Prompt: 3463. Check If Digits Are Equal in String After Operations II

I am giving you a leetcode problem with test case, your job is to write a clean and concise code. 
The question will have private test cases, so make sure your cover edge cases as well. 
Here is the problem Statement : 

You are given a string s consisting of digits. Perform the following operation repeatedly until the string has exactly two digits:

For each pair of consecutive digits in s, starting from the first digit, calculate a new digit as the sum of the two digits modulo 10.
Replace s with the sequence of newly calculated digits, maintaining the order in which they are computed.
Return true if the final two digits in s are the same; otherwise, return false.

Example 1:

Input: s = "3902"

Output: true

Explanation:

Initially, s = "3902"
First operation:
(s[0] + s[1]) % 10 = (3 + 9) % 10 = 2
(s[1] + s[2]) % 10 = (9 + 0) % 10 = 9
(s[2] + s[3]) % 10 = (0 + 2) % 10 = 2
s becomes "292"
Second operation:
(s[0] + s[1]) % 10 = (2 + 9) % 10 = 1
(s[1] + s[2]) % 10 = (9 + 2) % 10 = 1
s becomes "11"
Since the digits in "11" are the same, the output is true.
Example 2:

Input: s = "34789"

Output: false

Explanation:

Initially, s = "34789".
After the first operation, s = "7157".
After the second operation, s = "862".
After the third operation, s = "48".
Since '4' != '8', the output is false.


You can try solving this qn using 1st principle reasonig

Output: Reka Flash 3

class Solution:
    def hasSameDigits(self, s: str) -> bool:
        while len(s) > 2:
            new_s = []
            for i in range(len(s) - 1):
                new_digit = (int(s[i]) + int(s[i+1])) % 10
                new_s.append(str(new_digit))
            s = ''.join(new_s)
        return s[0] == s[1]

Results (Time Limit Exceed Error)

Time Limit Exceeded

The code passed all public test cases but failed to solve private test cases in record time.

Output: QWQ 32b

class Solution:
    def hasSameDigits(self, s: str) -> bool:
        digits = list(map(int, s))
        while len(digits) > 2:
            new_digits = []
            for i in range(len(digits) - 1):
                new_digit = (digits[i] + digits[i + 1]) % 10
                new_digits.append(new_digit)
            digits = new_digits
        return digits[0] == digits[1]

Results (time limit exceed error)

Time Limit Exceeded

As expected, both of the model failed to solve the given question; despite nudging to solve the problem from 1st principal reasoning, this shows the capability of model is limited

Final Thoughts

For small coding task Both model works fine, however needs to be explicitly and carefully prompted. However, for rendering related task QwQ is a better choice.
The limiting factor to not use Reka Flash 3 is the reasoning steps are embedded in output itself, rather than separate collapsible bar and its common AI Hallucinations.
As for me I would not use both of them, rather go to Gemma / Lamma. They are way better at coding, rendering task.

Let’s move on and check models on next task

Reasoning

The second domain for testing is Reasoning. Modern LLMs are great at finding details and solving problems. Let’s see how both models perform on my reasoning questions.

River Crossing Puzzle - Modified

This question is modified version of the flash game variant, I used to play as a kid. Idea is to transfer a wolf, cow and fodder, but he can carry one at a time and left alone two food chain kicks in, so how to transfer.

The modified version makes it simple by introducing 3 secure compartments, while keeping details the same. This identification is key to solution - All three can be takes as compartment are isolated.

Let’s see which models will find this out.

Prompt: Crossing The River

A farmer wants to cross a river and take with him a wolf, a goat and a cabbage. He has a boat with three secure separate compartments. If the wolf and the goat are alone on one shore, the wolf will eat the goat. If the goat and the cabbage are alone on the shore, the goat will eat the cabbage. How can the farmer efficiently bring the wolf, the goat and the cabbage across the river without anything being eaten?

Output: Reka Flash 3

Take all three animals (wolf, goat, cabbage) across the river simultaneously.

Place each animal in a separate compartment to prevent interaction.
Since none of the dangerous pairs (wolf-goat or goat-cabbage) are left alone on either shore, both banks remain safe throughout the trip.

Ans : Take all three in separate compartments in one trip

Fantastic, got the intricate details and solved the problem in 64.9 seconds (most time went for COT reasoning - 4391 tokens). Despite being an open source its good!

Now let’s check response for QwQ 32b

Output: QwQ 32b

This is a classic logic puzzle that requires careful planning to ensure that the wolf, goat, and cabbage are transported across the river without any of them being eaten. So answer is:

1. Take the goat to the far shore.
2. Return alone.
3. Take the wolf to the far shore.
4. Bring the goat back to the starting shore.
5. Take the cabbage to the far shore.
6. Bring the goat to the far shore.

### Final Answer:
{1. Take goat to far shore, 2. Return alone, 3. Take wolf to far shore, 4. Bring goat back, 5. Take cabbage to far shore, 6. Bring goat to far shore.}

The model failed to understand the intricate detail of question and reverted to the classic logi puzzle answer, so it fails to generate the correct response in 21.8 seconds (560 tokens)

Now let’s test it on another question

Minimum No of Weights Needed

This question was asked to be by my uncle. It comes under - binary decision testing (divide and conquer), logical deduction & optimization testing. I failed to do it, let’s see how both models perform the task

Prompt: Minimum Weight Balancing

You are given 8 identical-looking balls. One of them is slightly heavier, but you don't know which one. You have a balance scale that allows you to compare the weight of two sets of balls. What is the minimum number of weights necessary to guarantee finding the heavier ball?

Solution

Divide the 8 balls into 3-3-2 groups.
Weigh the two groups of 3 balls against each other.
If one group is heavier, weigh any 2 balls from that group.
If one of the 2 balls is heavier, you have found the heavier ball.

Expected: Thus 2 no of weighs

Output: Reka Flash 3 (Beautified)

{COT}
In all cases, the maximum number of weighings required is 2. This confirms that the minimum number of weights necessary to guarantee finding the heavier ball is indeed 2.
Ans : 2

Perfect - Reka Flash 3 nailed it, it not only solves the problem, but also verified across multiple scenarios in 37.2 seconds (2452 tokens).

Let’s see how Qwen does

Output: QwQ 32b (Beautified)

{COT}
1. Divide 8 balls into Group A (3), Group B (3), and Group C (2).
2. Weigh Group A vs. Group B; if unbalanced, the heavier ball is in that group; else, it's in Group C.
3. Weigh two balls from the identified group; if unbalanced, the heavier one is found; else, it's the third ball.

Final Answer: 2 weighings.

QwQ also solved the problem easily, but it didn’t verify its result across multiple scenarios. However, it took 12.8 seconds (2383 tokens)

Final Thoughts

For general purpose and simple reasoning task go for QwQ 32b as it generates faster response in less token usage. Good if you are low on budget.
However, if need complex reasoning task solved, go for Reka Falsh 3, despite being both open source, its much better. Can opt for QwQ if low on budget, but results may not be optimal.
As for me, I will go for Groq 3 Deep Think (still open), as its reasoning abilities are great!

Anyway’s let’s move on and check models on next task.

Math

The third domain for testing is Math, which also involves reasoning. Modern LLMs are great at solving complex math problems. Let’s see if both models can compete. To keep it fair, I’ll test them on one simple and one complex problem.

Sequence Puzzle

A simple question for humans but a tough one for LLMs. Many models have failed to get it right. Let’s see if QwQ or Reka can crack it or not with reasoning steps!

Prompt: Sequence Puzzle

What is the next number in the sequence? 2, 6, 12, 20, 30, ?

Output: Reka Flash 3

{COT}
The next term should be 6×7=42.

Both methods confirm that the next number in the sequence is 42

Nice! The model got the right answer in 16.4 seconds (984 tokens). After checking the COT traces, I found it reasoned like a human, not a mathematician - using simple math instead of complex formulas. This makes the explanation easy to understand!

Reka Flash 3 - Reasoning Steps

Output: QwQ 32b

{COT}
The next term is 42

Qwen also solved the problem but required more time and tokens (39.8 sec ~ 1785 tokens). This was because of reasoning like a mathematician - deriving complex formula to find the solution as part of verification.

QwQ 32b → Reasoning Steps

Now let’s taste the model on the medium - complex one to find the real capabilities

GCD of a Series Set

I have used this question before also (as a hard one), though proprietary all solved it, let’s see how open-source ones do.

Prompt: Series GCD

Find the GCD of this series set: {n^99(n^60-1): n>1}

Output: Reka Flash 3


{COT}
The product of these primes is 
2×3×5×7×11×13×31×61=56704530.

Thus, the GCD of the series set is 
5670453

As expected, the model got it wrong. Compared to O1-reasoning, which got it right, this makes sense—open-source models have hallucination issues. Inspecting the COT traces, indeed it was the cause Traces with lots of "But wait" moments.

COT Reasoning Steps

Output: QwQ 32b

{COT}
16×9×25×7×11×13×31×61=6,814,407,600

Final Answer
The greatest common divisor of the set is  
6814407600

Despite the problem being hard, QwQ got it right! Upon inspection, I noticed the model started to hallucinate but then corrected itself through a self-evaluation step—something missing in Reka Flash 3.