Mark Davies, who developed the invaluable Corpus of Contemporary American English (COCA), recently launched two new language corpuses: Corpus del Español and Corpus do Português.
Once you get the hang of the query syntax and the user interface (which can be daunting at first), you can search through a large database of Spanish and Portuguese sentences to answer lots of different questions about these two languages, e.g. which preposition goes with insistir, which synonym of duro is most commonly used with trabajo, and many others.
Unlike Google, Mark’s corpuses allow you to search for all the grammatical forms of a word (just put the base form of the word in [brackets]), specify parts of speech (e.g. [v*] stands for any verb in any form), or search by proximity (e.g. find all adjectives within 5 words of ojos). They will also sort the results by frequency, which can be a real time-saver.
Chris Linguistiko Feb 25, 2011 at 3:44 pm
This is a great find; these are very good corpuses. I will certainly be using them to supplement my Spanish study. Now all they need is audio recordings!
Ben James Feb 28, 2011 at 4:07 pm
This is a great tool. However I do agree with chris that audio recordings would make it much better.
michau Mar 5, 2011 at 8:32 pm
Perhaps I’m nitpicking, but shouldn’t it read “corpora”? I’ve never seen the word “corpuses” in the context of large collections of texts. Google confirms my intuition: a search for “corpuses” leads mainly to Christian websites, while search for “corpora” leads to sites related to linguistics.
Tom Mar 6, 2011 at 1:59 pm
I’m not a big fan of Latin-derived plurals. They introduce unnecessary irregularity. If we are supposed to say “cacti” and “corpora” instead of “cactuses” and “corpuses”, then why not “circi” instead of “circuses”?
I use “corpuses” because I want to promote this form and because my Random House Webster’s Dictionary says it is correct. I realize I’m in the minority, but should I always follow the majority?