I was recently thinking about how effective reading is compared to Anki.

Suppose I want to learn 2000 words, which is B1 in CEFR. How many pages do I need to read daily to memorize them effectively?

According to Zipf's Law, the least frequent of those words will appear with the probability P/2000, and for English P=7% (and it is similar for other languages).

If a text has W words, then the probability of the least frequent word to appear in the text is approximately W*7/2000, or, in pages, 500*N*7/2000 = N*7/4, if I assume that an average page has 500 words.

How many pages do I need to read for all 2000 most frequent words to appear almost certainly? It's

N*7/4 = 100,

N≈60 pages

Let's assume that I need to see a word every day in order to memorize it. It gives me 60 pages/day for B1.

The number changes proportionally to the CEFR level (

https://universeofmemory.com/how-many-w ... ould-know/):

A2 - 30 pages/day

B1 - 60 pages/day

B2 - 120 pages/day

C1 - 240 pages/day

C2 - 480 pages/day.

While it looks doable for lower levels, consuming nearly 500 pages/day (or an equivalent amount of other input) is nearly impossible: it's several hours of reading! It seems to imply that in order to improve to C1-C2, you should either do a full-day immersion, or use Anki.

(There is an open question though how effective Anki is when you learn a word in a single context, or even if you use several contexts that you found by sentence mining, compared to encountering the word naturally in a book).

EDIT: Perhaps I should divide the number of pages by 2 or even by 3, because seeing a word not every day but once per two or three days should be enough. Also, if I want to learn words on a specific topic, choosing a book on this topic increases the frequency of the appearance of the words. Perhaps the numbers won't be so high after all. If I divide my initial estimates by 4, I get:

A2 - 7 pages/day

B1 - 15 pages/day

B2 - 30 pages/day

C1 - 60 pages/day

C2 - 120 pages/day

which is hard for C2, given that reading should be regular, but is pretty doable for C1.

I think more research can be done on this topic, with statistical tests applied to a text corpus. I want to prove the point that the effect of using Anki is comparable to having a sufficient amount of regular active input in L2, and the usefulness of Anki is exaggerated.