Topicmodels, topicmodels, …

I have previously done some topic modelling using LDA (Latent Dirilech Allocation). Back then I used a nice video from some nice guy but somehow could not find the video with search engines anymore. Too bad. Implemented LDA in Java back then based on that tutorial. I learned how it works, not why it works. Still don’t quite get why the set of topics emerges from the algorithm.

Actually I found a reasonably good explanation on Quora. Well, it is a good one if you already know most of how LDA works. Eh. Also a tutorial briefly summarizing how online LDA works, which is a nice improvement, and I guess what the tools use these days.

The number of topics LDA produces is given as a parameter, and is always a bit of a puzzle for me how to pick the best number for topics. Googling for it, I found various references to using “perplexity” to choose the best number of topics. I still have not found a good “for dummies” explanation for what that really means in practice for LDA, or how to implement it. Maybe some of the libs out there will do it for me? Python seems all the rage in data science these days, because whatever. So after a few search, gensim it is.

Gensim seems to have some perplexity options and a bunch of weird formulas to apply. Is it so hard to write some simple docs and explain these things? I guess nobody pays people to do it, and doing for free would just go against the goal of making oneself important. Sort of makes sense, and applies to most OSS software I have used. Or maybe I am just bad at using stuff.

Anyway. There is also something called topic coherence in Gensim. This is supposed to be some way to evaluate the number of topics. Somehow the explanation does not work for me. I did not quite grasp how it works for real. So I just gave it a try to see what I get, that would be most important for me regardless.

I start with the English wikipedia (I used a May 2017 dump). Because it is sorta big and I can put the results here, everyone knows it and it’s public data. Gensim nicely comes with a script to parse it for dictionary and corpus:

python -m gensim.scripts.make_wiki

Then some code to build different sizes of topic models (25 to 200 topics in 25 topic size increments)

import logging, gensim, bz2
import os, sys

#http://stackoverflow.com/questions/13733552/logger-configuration-to-log-to-file-and-print-to-stdout
#https://aykutakin.wordpress.com/2013/08/06/logging-to-console-and-file-in-python/
#configure_log function reconfigures python logging to write to the specific directory for the analysis size. so lda25 log goes into lda25 dir
def configure_log(log_path, log_name):
    logFormatter = logging.Formatter("%(asctime)s [%(threadName)-12.12s] [%(levelname)-5.5s]  %(message)s")
    rootLogger = logging.getLogger()
    rootLogger.setLevel(logging.INFO)

    #http://stackoverflow.com/questions/12034393/import-side-effects-on-logging-how-to-reset-the-logging-module
    #http://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list
    #need to copy the list of handlers or we will be iterating what we are modifying and it will fail to work as intended
    handlers_to_remove = rootLogger.handlers[:]
    for handler in handlers_to_remove:
        rootLogger.removeHandler(handler)
        
    filters_to_remove = rootLogger.filters[:]
    for filter in filters_to_remove:
        rootLogger.removeFilter(filter)

    fileHandler = logging.FileHandler("{0}/{1}.log".format(log_path, log_name))
    fileHandler.setFormatter(logFormatter)
    rootLogger.addHandler(fileHandler)

    consoleHandler = logging.StreamHandler(sys.stdout)
    consoleHandler.setFormatter(logFormatter)
    rootLogger.addHandler(consoleHandler)

#load wikipedia dictionary. this gets generated by the gensim wikipedia script
id2word = gensim.corpora.Dictionary.load_from_text('wikires_wordids.txt.bz2')
#and the wikipedia corpus
mm = gensim.corpora.MmCorpus('wikires_tfidf.mm')

sizes = [25, 50, 75, 100, 125, 150, 175, 200]

#ensure_dir makes sure a given path exists, creating if needed
def ensure_dir(file_path):
    directory = os.path.dirname(file_path)
    if not os.path.exists(directory):
        os.makedirs(directory)

#run gensim LDA using autotuning for the hyperparameters
def run_auto():
    for size in sizes:
        dir = "lda_auto"+str(size)+"/"
        ensure_dir(dir)
        configure_log(dir, "lda_auto"+str(size))
        lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=size, update_every=1, chunksize=10000, passes=1, alpha="auto", eta="auto")
        lda.print_topics(20)
        lda.save(dir+"a_model"+str(size)+".lda")

#run gensim LDA using default values for the hyperparameters
def run_default():
    for size in sizes:
        dir = "lda"+str(size)+"/"
        ensure_dir(dir)
        configure_log(dir, "lda"+str(size))
        lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=size, update_every=1, chunksize=10000, passes=1)
        lda.print_topics(20)
        lda.save(dir+"model"+str(size)+".lda")

run_default()
run_auto()

The code above drops a set of 9 different sized topic models into matching directories. Both for default parameters and autotuned parameters. Takes a while to run. The machine I ran it on has 32GB RAM and a quad-core Core i7 processor (hyperthreads to 8 virtual cores). Resource use? I actually found the Gensim implementations are quite nicely optimized not to take huge amounts of memory, and they also pretty much make use of all the cores in a system. Except perhaps the topic cohesion ones that seemed to run single core still. Perhaps because they seem relatively new?

My first mistake in this regard was to think of LDA as a single-core solution. I implemented the original algorithm some times back, and did not see it becoming anything else. But the online version seems to batch it in pieces, which I guess makes it more parallelizable. And the Gensim docs also nicely describe how running this online algorithm now also merges the results in a way that you don’t necessarily need to run large numbers of passes (iterations) over the corpus to converge on a better model. Chunksize 10000 in the above code seems to cause this merge after each 10000 docs, and with Wikipedia having about 4 million articles, this amounts for quite a few merges. Maybe somewhat equal to iterations of old.

With logging enabled, Gensim prints some texts about “topic diff” between each batch and merge. This seems to indicate how much the topic model changed between the runs. So I plotted the topic diff for the wikipedia run (when generating the LDA models), to see how much the topics drift during the run. See figure below for the 9 sizes I used, using Gensim default LDA parameters:

lda_grid

And for using the autotuned parameters:

lda_a_grid

From this, it seems the topic model actually pretty much “converges” quite early in the process. That is, the topic diff goes down to a small number and the topics become quite stable across merges/iterations. Maybe because there is so much data in this dataset? And the autotuned version seems much more direct to converge. So I will use that later.

After this, I ran the same analysis on a bunch of document sets I have from different Finnish organizations. I won’t be putting the exact data for those documents online here, but I will show some statistics on the runs and the models produced, as well as my feeling from looking at the topics generated and the stats. Some stats when running the autotuned version (because the autotuned seemed to converge faster and about equally on quality on wikipedia):

type id doc count
1 3651
2 1930
3 679
4 5596
5 1058
6 343
7 228
8 1069
9 333
10 213
11 279
12 316
13 592
14 397
15 104
16 1076
17 1648

Since these have a very small number of documents when compare to Wikipedia, I ran the Gensim LDA model generator for them in the online mode using batch size of 1000. Separately with 10 iterations and 100 iterations to get some comparable data on impact of iteration counts. Listing all 3×3 grids for the 17 document sets would be a bit much to show here. So after looking at them, I figured they were mostly similar but with maybe a few minor differences. So I picked three types (based on my feelings when looking at the figures):

Type 1 (this grid is for doc set with type id 6 from above):
10 iterations:
t6_bd_lda_a_grid

100 iterations:
t6_bd_lda_a100_grid

Type 2 (this grid is for doc set with type id 5 from above):
10 iterations:
t5_j_lda_a_grid

100 iterations:
t5_j_lda_a100_grid

Type 3 (this grid is for doc set with type id 7 from above):
10 iterations:
t7_sd_lda_a_grid

100 iterations:
t7_sd_lda_a100_grid

Remember, the types are just something I made up myself. I chose Type 1 to refer to models where there was a big difference from 10 iterations to 100 iterations in the final topic diff for the 25 topic run. In the example Type 1 figures here (for doc type 6), the 10 iteration run gets to around 0.25 final diff. In my set for type 1, document sets 2, 16, and 17 had the biggest diff of about 0.5 in the end after 10 iterations. Document sets 3, 6, 9, 12, 13, and 14 were close to 0.2 diff after 10 iterations. Document sets 10 and 11 were close to 0.1 diff for 10 iterations. Each of these was close to 0 final diff after 100 iterations.

Type 2 refers to models where the 25 topics line has a noticeable “jiggly” effect to it. Maybe this is between the iterations (or “passes”)? Not sure how Gensim restarts iterations, so could have something to do with it. Topics for document sets 5 and 8 had the biggest such effects, as also shown in the Type 2 figure above for document set 5. For document sets 1 and 4, the effect was smaller but still seemed to be there.

Type 3 refers to models where there was no big difference in final topic diff in 10 vs 100 iterations. This was just the models for document sets 7 and 15. These are also the two smallest document sets (least docs). Maybe smaller sets converge better with fewer iterations?

Looking at the document type count table above, there is no clear correlation with document count and the types of figures (1,2,3) I used above. There could be other differences in properties of the documents (e.g., length, number of real distinct topics embedded in each). Not in my scope to investigate further, but the reasons could be anything, what do I know.

The properties I used to select the types are mostly visible in the smaller number of topics. With higher number of topics they all seem quite similar. Maybe the algorithm has to work harder to fit the data into fewer topics? Or maybe I just have so little data there that larger number of topics always produces garbage topics uniformly? No idea, really.

The code I used to run this is here:

__author__ = 'teemu kanstren'

#loads docs from es and runs lda on those, saves the model

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from gensim.corpora.dictionary import Dictionary
from gensim import corpora
import gensim
import logging, sys, os

#configure logging for gensim and other packages to write to correct dir and with given log file name
def configure_log(log_path, log_name):
#this is the same code as before for this function so not repeating here..

#ensures a dir exists
def ensure_dir(file_path):
    directory = os.path.dirname(file_path)
    if not os.path.exists(directory):
        os.makedirs(directory)

configure_log(".", "teemu")

es = Elasticsearch()

indices=es.indices.get_alias().keys()
print(indices)

#get mapping for the index we are interested in
mapping = es.indices.get_mapping("my_index")
print(mapping)

#find all document types in the mapping
keys = mapping["my_index"]["mappings"].keys()
types = [key for key in keys]
print(types)

fields = es.indices.get_field_mapping(index="my_type", fields="*")
print(fields)

#https://marcobonzanini.com/2015/02/02/how-to-query-elasticsearch-with-python/

def process_search(s, dirname, filename):
    dir = "output/"+dirname+"/"
    ensure_dir(dir)
    count = 0

    dict = Dictionary()

    for hit in s.scan():
        #    print(hit.meta.score, hit.file_name)
        #    print(count)
        #skip file if we are lazy with the query writing and potentially loading too many and need a specific fifeld
        if "my_contents" not in hit: continue
        count += 1
        # update dictionary with document words
        dict.doc2bow(hit.my_contents.split(), allow_update=True)

    print(count)
    print(dict)

    corpus = []
    for hit in s.scan():
        if "my_contents" not in hit: continue
        line = dict.doc2bow(hit.my_contents.split())
        corpus.append(line)

    dict.save(dir+filename+"_hellome.dict")
    corpora.MmCorpus.serialize(dir+filename+'_hellome-corpus.mm', corpus)

    # exit()

    sizes = [25, 50, 75, 100, 125, 150, 175, 200]
    for size in sizes:
        configure_log(dir, filename+"_lda_auto" + str(size))
        lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dict, num_topics=size, update_every=1, chunksize=1000, passes=10, alpha="auto", eta="auto")
        lda.print_topics(size)
        lda.save(dir + filename+"_a_model" + str(size) + ".lda")

    for size in sizes:
        configure_log(dir, filename+"_lda_auto_100" + str(size))
        lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dict, num_topics=size, update_every=1, chunksize=1000, passes=100, alpha="auto", eta="auto")
        lda.print_topics(size)
        lda.save(dir + filename+"_a_model_100" + str(size) + ".lda")

for type in types:
    #this is simply if you want to combine several, so the ES query is just a list for doc_type
    s = Search(using=es, index="oulu_komu", doc_type=[type, type+"_extra_field"]) \
        .query("match_all").sort("doc_id")
    process_search(s, type, type)

And to plot it:

__author__ = 'teemu kanstren'

import matplotlib.pyplot as plt
import sys

dirname=sys.argv[1]

sizes = [25, 50, 75, 100, 125, 150, 175, 200]

def read_log_data(fileprefix):
    log_data = []
    for size in sizes:
        #http://stackoverflow.com/questions/8009882/how-to-read-large-file-line-by-line-in-python
        with open(fileprefix+str(size)+".log") as f:
            topic_diffs = []
            rhos = []
            iterations = []
            td_str = "topic diff="
            td_str_len = len(td_str)
            rho_str ="rho="
            rho_str_len = len(rho_str)
            i = 0
            for line in f:
                ti = line.find(td_str)
                ri = line.find(rho_str)
                if ti > 0 and ri > 0:
                    iterations.append(i)
                    i += 1
                    ti += td_str_len
                    ri += rho_str_len
                    te = line.index(",", ti)
                    re = len(line)
                    topic_diff = float(line[ti:te])
                    rho = float(line[ri:])
                    topic_diffs.append(topic_diff)
                    rhos.append(rho)
            log_data.append((iterations, topic_diffs, rhos))
            print("topic diffs:"+str(topic_diffs))
            print("rhos:"+str(rhos))
    return log_data

def create_plot(log_datum, row, col, topic_n, axarr):
    iterations = log_datum[0]
    topic_diffs = log_datum[1]
    rhos = log_datum[2]
    axarr[row, col].plot(iterations[1:], topic_diffs[1:])
    axarr[row, col].plot(iterations[1:], rhos[1:])
    axarr[row, col].set_title('LDA'+str(topic_n))

def create_plots(suffix):
    plt.figure()
    plt.gcf().set_size_inches(18.5, 10.5)
    f, axarr = plt.subplots(3, 3)

    log_data = read_log_data(dirname+"/"+dirname+suffix)
    #log_data2 = read_log_data(dirname+"/"+dirname+"_lda_auto_100")

    row = 0
    col = 0
    for idx, val in enumerate(log_data):
        create_plot(log_data[idx], row, col, sizes[idx], axarr)
        col += 1
        if col >= 3:
            col = 0
            row += 1

    # Fine-tune figure; make subplots farther from each other.
    f.subplots_adjust(hspace=0.3)

    plt.gcf().set_size_inches(18.5, 10.5)

create_plots("_lda_auto")
plt.savefig(dirname+'/lda_a_grid.png', bbox_inches='tight', dpi=200)
plt.savefig(dirname+'/lda_a_grid.pdf', bbox_inches='tight', dpi=200)

create_plots("_lda_auto_100")
plt.savefig(dirname+'/lda_a100_grid.png', bbox_inches='tight', dpi=200)
plt.savefig(dirname+'/lda_a100_grid.pdf', bbox_inches='tight', dpi=200)

And once the models are built, the Gensim cohesion estimatior can be run to evaluate which of these is best according to Gensim. I used the u_mass evaluator here, since it does not require the corpus to be reloaded. According to this website, others such as c_v are more accurate while u_mass is faster. For my experiments I am just looking for a general experience on usefulness of the coherence measure here. If I had more motivation and resources I might try the others as well. Mostly resources, since my results are not too good and further exploration would be interesting to make the results better. But lets not jump too far. Code:

__author__ = 'teemu kanstren'

from gensim.models.coherencemodel import CoherenceModel
import logging
import gensim, sys

dirname = sys.argv[1]
size = int(sys.argv[2])
dir = dirname+"/"

#first set up python logging to go into the separate subdir+filename for the given dirname and size
logFormatter = logging.Formatter("%(asctime)s [%(threadName)-12.12s] [%(levelname)-5.5s]  %(message)s")
rootLogger = logging.getLogger()
rootLogger.setLevel(logging.DEBUG)

fileHandler = logging.FileHandler(dir+"coherence"+str(size)+".log") #log name
fileHandler.setFormatter(logFormatter)
rootLogger.addHandler(fileHandler)

consoleHandler = logging.StreamHandler()
consoleHandler.setFormatter(logFormatter)
rootLogger.addHandler(consoleHandler)

log = logging.getLogger("bob") #this ("bob") can be whatever but do check python docs

log.info("calculating coherence for size:"+str(size))

log.info("loading dictionary")
dictionary = gensim.corpora.Dictionary.load(dir+dirname+'_hellome.dict')
log.info("loading corpus")
corpus = gensim.corpora.MmCorpus(dir+dirname+'_hellome-corpus.mm')
log.info("loading previously generated lda model")
lda = gensim.models.ldamodel.LdaModel.load(dir+dirname+'_a_model'+str(size)+'.lda')

log.info("building coherence model")
cm = CoherenceModel(model=lda, corpus=corpus, coherence='u_mass')
log.info("cm built, getting coherence")
c = cm.get_coherence() #this is the part that seems to do the calculation and takes a while
log.info("done, c="+str(c))

And to plot it:

__author__ = 'teemu kanstren'

import sys
import matplotlib

#this statement needs to be before importing pyplot if wanting to run in headless mode
matplotlib.use('Agg')
sizes = [25, 50, 75, 100, 125, 150, 175, 200]

import matplotlib.pyplot as plt
from os import walk

dirname=sys.argv[1]

def read_log_data(dirname):
    fileprefix = dirname+"/coherence"
    iterations = []
    for size in sizes:
        #http://stackoverflow.com/questions/8009882/how-to-read-large-file-line-by-line-in-python
        with open(fileprefix+str(size)+".log") as f:
            target_str = " c="
            target_str_len = len(target_str)
            i = 0
            for line in f:
                ti = line.find(target_str)
                if ti > 0:
                    start_i = ti+target_str_len
                    iterations.append(line[start_i:])
                    i += 1
    return iterations

data = read_log_data(dirname)
print(data)

f, ax = plt.subplots()
ax.plot(sizes, data)
ax.set_title('Coherence 10 iterations')
plt.savefig(dirname+'_lda.png', bbox_inches='tight', dpi=200)

And the results for each of the document sets:

Doc set id 10 iterations 100 iterations
1 t1_lda_10 t1_lda_100
2 t2_lda_10 t2_lda_100
3 t3_lda_10 t3_lda_100
4 t4_lda_10 t4_lda_100
5 t5_lda_10 t5_lda_100
6 t6_lda_10 t6_lda_100
7 t7_lda_10 t7_lda_100
8 t8_lda_10 t8_lda_100
9 t9_lda_10 t9_lda_100
10 t10_lda_10 t10_lda_100
11 t11_lda_10 t11_lda_100
12 t12_lda_10 t12_lda_100
13 t13_lda_10 t13_lda_100
14 t14_lda_10 t14_lda_100
15 t15_lda_10 t15_lda_100
16 t16_lda_10 t16_lda_10
17 t17_lda_10 t17_lda_10

So how does all this feel when I load the topics up and look at them?

Have to say, maybe not very excited. Mostly the topics make at least some sense but many of those coherence measures show higher values for bigger numbers. Like 100 iteration coherence for document sets 7 and 15 showing a set of topics around 150 would be great. Doc set 15 even has fewer documents that that. Manually looking at the generated topics, a large number them are almost the same topics actually. They have mostly the same words, and very low weights for topics/words, meaning very few words in the docs got assigned to the topics. So it would seem that for most purposes topic count for these document sets is better at the lower number of topics. Unless maybe if you want to capture really fine grained differences in topics. Not sure what that would be good fo but maybe it has some use cases.

So if the smaller number of topics would be better, maybe I need to try even smaller number of topics. Seems reasonable given the smallish number of documents I have. Like number of topics at 5, 10, 15, 20. See where that takes me. Here we go:

Doc set id coherence (autotuned parameters, 100 iterations)
1 t1s_lda_100
2 t2s_lda_100
3 t3s_lda_100
4 t4s_lda_100
5 t5s_lda_100
6 t6s_lda_100
7 t7s_lda_100
8 t8s_lda_100
9 t9s_lda_100
10 t10s_lda_100
11 t11s_lda_100
12 t12s_lda_100
13 t13s_lda_100
14 t14s_lda_100
15 t15s_lda_100
16 t16s_lda_10
17 t17_lda_10

Comparing these figures with the ones before for topic counts 25-200, the lower number of topics generally scored better here. Just for a quick comparison, most of these 2-20 sizes have the highest score close to -0.5 to -0.7, while the best scores for 25-200 were closer to -1.0. The difference being againg topic 15, which trolls us again with a value close to -0.8 at 3 and 150 topics. Eh.

For final comparison and seeing what I think of the topics found at different sizes, I simply manually examined the topics by printing them to files like so:

__author__ = 'teemu kanstren'

from gensim.models import LdaModel
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from collections import defaultdict

import gensim
import operator, logging, sys

def configure_log(log_path, log_name):
    #again, this configure_log is the same as in previous samples so not repeating..

def process_lda_model(dict, model_file, topic_count, docs):
    log = logging.getLogger("bob")

    lda = LdaModel.load(model_file, mmap='r')
    topic_words = {}
    for t in range(topic_count):
        # top is now list of tuples (word, probability). topn=number of words to take
        top = lda.show_topic(t, topn=100)
        topic_words[t] = top

    #now calculate the size (or "relevance") of each topic. 
    #meaning large portion of all docs was assigned to each topic.

    topic_sizes = defaultdict(int)

    for doc in docs:
        doc_bow = dict.doc2bow(doc)
        dist = lda[doc_bow]
        for topic_word in dist:
            #count topic sizes by summing the percentage of all words in all docs assigned to that topic
            #(note: instances of one word can be in different topics across the doc)
            topic_id = topic_word[0]
            percent = topic_word[1]
            topic_sizes[topic_id] += percent
    log.info("sized topics")

    #now calculate the size (or "relevance") of each word in each topic in relation to other topics
    #so if word "hello" is 90% of topic A, which is itself 90% of all docs, "hello" gets a size of 0.9*0.9 for topic A

    topic_words_weighted = {}
    for t in range(topic_count):
        t_words = topic_words[t] #get the top words for this topic as stored before
        topic_size = topic_sizes[t] #the weight/size/relevance of this topic as calculated before
        tw_words = [] #to hold list of weighted words for this topic
        topic_words_weighted[t] = tw_words
        for word, percent in t_words:
            my_tuple = (word, percent * topic_size)
            tw_words.append(my_tuple)

    log.info("sized words")

    #sort the topics in numerical order so sorted_topics contains them in order topic 0, topic 1, topic 2, ...
    sorted_topics = sorted(topic_sizes.items(), key=operator.itemgetter(0))

    #finally, create a nice file to write it all out in my favourite format
    file_data = ""

    for topic in sorted_topics:
        topic_id = topic[0]
        file_data += "topic"+str(topic_id)+"="
        tww = topic_words_weighted[topic_id]
        for tw in tww:
            #word sizes are floats, and typically quite small ones. like 0-10 or so. 
            #multiply by 100 to give the values some diff when converted to ints
            word_size = int(tw[1]*100)
            file_data += tw[0]+"["+str(word_size)+"] "
        file_data += "\n"

    log.info("built file data")
    print(file_data)
    return file_data

#create the weighted word list for the docs given by the elasticsearch query stored in "s"
#assume lda models are stored under "dirname" in "fname" with specific extensions
def process_model(s, dirname, fname):
    log = logging.getLogger("bob")
    configure_log(dirname, dirname+"_topicbulklister.log")
    dict = gensim.corpora.Dictionary.load(dirname+"/"+dirname+'_hellome.dict')
    docs = []
    count = 0
    for hit in s.scan():
        count += 1
        #taking the lazy way out here, loading all docs into memory for processing
        #mostly because my doc sets are small and i got tired of optimizing everything when no real need
        #of course, it would be nice to have an example of doing it right for real cases later..
        docs.append(hit.contents.split())

    log.info("loaded "+str(count)+" docs for:" + dirname)
    sizes = [25, 50, 75, 100, 125, 150, 175, 200]
    for size in sizes:
        #these would be models with 10 iterations
        log.info("processing model size:" + str(size))
        model_file = dirname +"/"+ fname + "_a_model" + str(size) + ".lda"
        file_data = process_lda_model(dict, model_file, size, docs)
        f = open(dirname+"/topics_a"+str(size)+".txt", 'w')
        f.write(file_data)
        f.close()

        #these would be models run with 100 iterations
        log.info("processing a100_model size:" + str(size))
        model_file = dirname +"/"+ fname + "_a_model_100" + str(size) + ".lda"
        file_data = process_lda_model(dict, model_file, size, docs)
        f = open(dirname+"/topics_a100_"+str(size)+".txt", 'w')
        f.write(file_data)
        f.close()


es = Elasticsearch()

#https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/
#http://miningthedetails.com/blog/python/lda/GensimLDA/
#https://groups.google.com/forum/#!topic/gensim/s4OivwKdfng

mapping = es.indices.get_mapping("my_index")
# find all document types in the mapping
keys = mapping["my_index"]["mappings"].keys()
types = [key for key in keys]

for type in types:
    #this is simply if you want to combine several, so the ES query is just a list for doc_type
    s = Search(using=es, index="my_index", doc_type=[type, type+"_extra_field"]) \
        .query("match_all").sort("doc_id")
    process_model(s, type, type)

After dumping all my doc sets (1-17) like this, and looking at the ones getting the highest/lowest cohesion values, I could not really say in any way that the values would have been better for the highest cohesion values. Certainly for these small document sets, the smaller topic counts were better if looking for clearly distinct topics. Which I think most people would look for. So I am sure there is some value here. And trying out the more accurate cohesion metrics such as c_v (as discussed at the beginning of this post) would probably give better results. Maybe someday.

Alternatively, for a more visual exploration, there is also the option to use the LDAvis package. Wikipedia example:

__author__ = 'teemu kanstren'

import gensim
import pyLDAvis.gensim
import sys
import logging

size = int(sys.argv[1])
dir = "lda"+str(size)+"/"

logFormatter = logging.Formatter("%(asctime)s [%(threadName)-12.12s] [%(levelname)-5.5s]  %(message)s")
rootLogger = logging.getLogger()
rootLogger.setLevel(logging.DEBUG)

fileHandler = logging.FileHandler(dir+"ldavis"+str(size)+".log")
fileHandler.setFormatter(logFormatter)
rootLogger.addHandler(fileHandler)

consoleHandler = logging.StreamHandler()
consoleHandler.setFormatter(logFormatter)
rootLogger.addHandler(consoleHandler)

log = logging.getLogger("bob")

log.info("processing model size:"+str(size))

log.info("loading dictionary")
dictionary = gensim.corpora.Dictionary.load_from_text('wikires_wordids.txt.bz2')
log.info("loading corpus")
corpus = gensim.corpora.MmCorpus('wikires_tfidf.mm')
log.info("loading lda")
lda = gensim.models.ldamodel.LdaModel.load(dir+'model'+str(size)+'.lda')

log.info("preparing model")
p = pyLDAvis.gensim.prepare(lda, corpus, dictionary)
log.info("saving HTML")
pyLDAvis.save_html(p, dir+'lda'+str(size)+'.html')
log.info("done")

This dumps the whole LDAvis thing into a HTML file you can then load up any time later and play with. Nice thing about this is that it can be run on a headless remote server, and produces a single HTML file (a bit large but anyway). This HTML file can then be downloaded and opened from a local file. So no webserver needed anywhere, and the interactive visualization can be shared as a single file.

How does it look? To continue avoiding dumping the Finnish datasets here, I use examples for 25, 100 and 200 topics from Wikipedia:

25:
ldavis25

100:
ldavis100

200:
ldavis200

The first (and biggest) topic in the list of 25 is related to movies. Same for the 100 topics. In 200 topics, music takes the first spot. In 200, the second is about novels (book), third football, and finally movies come fourth.

In the LDAvis figure here for 25 topics, the cluster of four smaller ones on the right are related to Asian countries. In the topic word list below for 25 topics, these are topics 4, 14,16, and 20. The numbering is just different because they are ordered differently. The LDAvis figure above for 200 topics also has a cluster of small ones on the left, with many of those for countries/states but also some for other topics such as chess, church, weightlifting and more. I am sure this would also be an interesting topic to study, why PCA grounds them together.

In general, there are a number of parameters to play with in LDAvis, and I don’t pretend to know all of/about them. For example, you can cycle through the topics using the controls on the top as well. A handy tool for topic exploration.

But I do also prefer just using the textual outputs of the topics as shown below. To see a large number of topics at once vs cycling through one at a time. Maybe some combination would work best.

The 25 and 100 topics from wikipedia for my text output code above:

25 Wikipedia topics (I manually tried cut these to 20 top words from 100 I printed, so its ~20 words each):

topic0=missouri[103342] wisconsin[87078] iowa[73418] virginia[70289] illinois[69885] arkansas[69130] carolina[68071] michigan[65583] ohio[60676] texas[60572] community[57331] washington[56913] indiana[54950] oregon[50765] florida[49446] district[46548] tennessee[46349] georgia[45458] california[45178] minnesota[45132] 
topic1=radio[76316] fm[67433] tv[52798] station[48613] channel[45489] television[39179] news[38537] broadcast[32143] broadcasting[27691] suffusion[26570] show[25864] am[24653] intelsat[24026] network[23193] owned[22635] pm[20234] presenter[17802] format[15873] program[15775] satellite[15183] 
topic2=village[221007] river[158933] district[140531] population[132051] km[116103] lake[111496] census[93802] island[90835] workers[84431] mountain[78471] park[74170] municipality[69123] creek[66698] reserve[65274] villages[63916] region[62479] road[61653] forest[61465] nearest[58958] town[58572] 
topic3=historic[213169] building[206192] station[165629] railway[147564] church[125265] register[124965] listed[100791] places[100328] street[90542] buildings[90221] brick[79067] roof[76969] bridge[72999] story[70627] road[62897] tower[62769] style[59430] district[57602] construction[57450] stone[54772] 
topic4=bangladesh[60954] india[47358] indian[43847] singh[37676] delhi[22949] kumar[22606] ludhiana[22469] sarpanch[21241] punjab[21091] bengal[19677] dhaka[19650] nepal[18896] hindi[17747] maharashtra[14853] raj[14251] bengali[14100] mumbai[13728] assam[13684] ram[12357] bangladeshi[11979]
topic5=mollusca[32685] mandal[26963] vijayawada[25245] space[22879] physics[18043] earth[17861] satellite[17756] ngc[17519] theory[17341] mathematics[17237] mathematical[16721] star[16649] subsp[15672] orbit[15671] solar[15360] indistinct[15112] purplish[14796] quantum[14606] observatory[14220] fascia[14179] 
topic6=art[151996] museum[105653] gallery[67696] painting[64919] artist[50553] exhibition[50485] painter[46831] paintings[42141] arts[41286] jpg[36333] artists[35557] sculpture[34781] temple[33499] exhibitions[33377] works[33171] collection[32549] meyrick[31776] fine[28686] file[26993] exhibited[26811] 
topic7=la[148371] le[80186] french[73902] german[71216] des[69606] italian[61779] der[61079] paris[60412] du[55323] del[54742] et[53219] spanish[53162] france[51363] jean[51084] von[49691] les[48917] el[46300] di[45814] josé[44796] und[42241] 
topic8=orchestra[44592] opera[35813] composer[34666] piano[29442] symphony[24092] conductor[18077] ballet[17176] violin[16627] choir[16239] musical[14962] pianist[14198] ensemble[13952] performed[13784] soprano[13767] composition[13440] concert[13162] concerto[13151] festival[13137] agder[12684] quartet[11731] 
topic9=episode[105985] films[103371] award[100538] television[100108] directed[93229] cast[92256] tv[91624] awards[90325] festival[89783] actor[88894] novel[84196] role[83970] drama[81154] theatre[79356] actress[79129] story[78659] director[78350] book[77382] episodes[71225] show[67188] 
topic10=research[125717] professor[103512] education[98381] science[92598] institute[90969] society[70253] students[68236] medical[66741] journal[66189] women[63658] studies[62359] award[60876] health[58041] sciences[56046] degree[55455] social[53099] engineering[49884] association[48622] director[48312] department[48123]
topic11=game[76288] software[49848] tamil[47686] india[47314] data[37736] business[33861] mobile[31384] indian[31195] app[31181] companies[30977] bank[30409] services[30306] million[30044] users[29862] http[29824] com[29450] technology[28942] founded[28176] platform[27679] online[27549] t
topic12=bishop[183424] church[156784] catholic[100783] roman[92168] cathedral[61440] pope[58868] diocese[56411] priest[48203] king[48191] archbishop[45448] saint[37550] titular[35613] ordained[34872] religious[32680] papacy[32656] appointed[32149] consecrated[32043] prelate[31535] ancient[30725] holy[30355] 
topic13=scottish[70752] london[66229] sir[61661] william[57129] edinburgh[55924] married[53191] england[53094] scotland[50931] royal[50117] wales[49541] son[44603] ireland[42954] educated[38190] glasgow[37848] thomas[37705] henry[35982] george[35583] james[35313] daughter[35242] irish[34458]
topic14=hong[52137] kong[46242] korean[42778] kim[39075] norwegian[38362] chinese[35425] peakposition[34595] korea[33229] swedish[31117] china[27414] taiwan[26788] lee[25540] thailand[25034] qualifier[23037] thai[21629] norway[20943] jung[19624] min[19478] bangkok[19244] chen[18922] 
topic15=album[361182] song[256952] chart[187401] band[164060] track[134572] vocals[118688] guitar[109699] label[100924] songs[98269] listing[97232] you[96709] studio[95459] records[91948] albums[91396] charts[90836] release[86768] singles[84327] video[82745] singer[80484] bass[76230] 
topic16=japan[58447] japanese[57450] tokyo[33408] termen[29464] albanian[24897] anime[19283] albania[18629] fuji[17357] manga[15986] prefecture[15531] tbs[15178] osaka[12928] ntv[10843] tirana[9896] kyoto[8861] nagano[8601] ni[8014] nippon[7697] kazakhstan[7353] niigata[6965]
topic17=army[114556] regiment[93198] military[74472] navy[70157] division[68054] aircraft[65031] air[65002] ship[64259] infantry[63562] brigade[55079] commander[54114] battle[52332] corps[51897] command[49360] force[46507] naval[46317] forces[45781] battalion[44770] officer[41944] ships[41202]
topic18=al[101883] russian[82869] pakistan[49870] ukrainian[43693] ali[43688] sri[43566] khan[40688] soviet[39267] turkish[38736] moscow[37860] ukraine[35605] iran[35236] polish[33661] russia[30179] islamic[29989] indian[29284] mosque[28952] india[27978] turkey[27629] constituency[27275] 
topic19=league[349638] football[331012] cup[244141] club[238609] tournament[234016] championships[208445] championship[207165] round[184733] player[168229] goals[165863] games[164860] women[158768] coach[156908] basketball[153931] teams[149110] apps[147840] division[143359] professional[125977] match[125015] fc[120375] 
topic20=serbian[38639] china[36968] chinese[35505] serbia[28052] li[24803] croatian[22860] bosnia[19552] belgrade[19534] zhang[19404] wang[19349] herzegovina[16988] segunda[16327] greek[15680] croatia[15431] beijing[14989] rebounds[14775] liu[14475] yugoslav[13266] zagreb[12612] chen[12575]
topic21=engine[39614] energy[35303] power[34124] protein[33132] car[31071] model[30585] cells[29632] gas[27840] design[27833] production[27820] plant[27718] water[27239] system[24462] weight[24101] chemical[22907] acid[21817] gene[21771] cars[21734] type[21152] development[20916]
topic22=party[223593] election[194792] minister[116518] president[104109] elected[97838] council[93701] law[89045] democratic[88397] court[86376] elections[82934] political[81414] assembly[77292] votes[71977] politician[67919] committee[67674] parliament[67278] secretary[65255] union[63703] legislative[63331] police[61634] 
topic23=species[273887] genus[107924] fuscous[90171] mm[89481] forewings[76966] moth[71553] hindwings[67873] described[64660] grows[61203] wingspan[60843] dark[60647] costa[58702] grey[58502] shrub[58279] flowers[57526] ochreous[51859] australia[50788] description[48983] brown[48945] whitish[47779]
topic24=mf[54407] df[43579] outscored[43032] michael[27588] george[27293] james[27233] david[26909] cast[26476] robert[25935] paul[24548] jack[21916] william[21808] smith[21616] peter[21466] richard[20760] frank[20154] ap[19931] tom[19720] joe[18140] directed[18092] 

100 Wikipedia topics (too many topics here so did not manually try to cut it):

topic0=ufc[14709] cornwall[6614] akron[5052] quercus[5002] choke[3639] viaduct[3550] diablos[3463] nani[3381] cornish[3153] hokuriku[3095] zombie[2958] amarillo[2874] quezon[2823] cove[2805] shingle[2664] llanelli[2557] hyeon[2525] lubbock[2443] shooto[2318] bacolod[2253] boku[2209] devonport[2175] belltower[2106] aru[2044] tachi[2000] watashi[1924] quilt[1917] viterbo[1905] aki[1894] grahamstown[1894] angelica[1864] grosvenor[1835] jiu[1812] kacper[1745] yarmouth[1715] volgograd[1706] naru[1694] ives[1686] tomsk[1679] lawton[1665] chinatown[1615] vulgare[1612] bonifacio[1592] chelmsford[1574] pasco[1572] falmouth[1571] dorchester[1557] talmadge[1554] arnheim[1551] jitsu[1544] lunenburg[1542] carousel[1542] truro[1522] zombies[1518] herrero[1509] redruth[1474] brera[1468] águila[1443] rockville[1438] roswell[1434] atif[1417] devon[1417] christi[1411] alston[1404] lenox[1386] anata[1385] llm[1381] usta[1372] mana[1369] mojave[1362] kore[1331] gracie[1328] petrucci[1327] markham[1316] rockaway[1314] laredo[1314] mccord[1313] sherborne[1298] koti[1283] dutchess[1277] riggs[1252] barnstaple[1237] coney[1232] kono[1228] yell[1213] galán[1210] farris[1206] kanto[1205] mcallen[1203] winona[1183] tsa[1170] glitch[1157] buller[1155] nationaal[1152] bia[1144] sphagnum[1139] launceston[1132] bernardino[1116] woodbine[1111] reale[1110] 
topic1=russian[86175] bwf[66598] soviet[41495] moscow[40087] russia[36133] ukrainian[31745] ukraine[28202] hurdles[19550] vladimir[19430] armenian[14691] petersburg[14496] kazakhstan[12926] azerbaijan[12492] ussr[11904] saint[11006] mikhail[10926] armenia[10709] belarusian[10337] ivan[10284] nikolai[10273] alexander[10230] kiev[10169] sergey[9644] latvian[9465] ru[8885] union[8538] aleksandr[8422] georgian[8237] leningrad[7996] sergei[7734] на[7652] freestyle[7589] belarus[7582] azerbaijani[7228] dmitry[7075] latvia[6959] lenin[6831] riga[6595] boris[6426] lithuanian[6339] rostov[5940] andrei[5905] ssr[5866] konstantin[5819] backstroke[5784] pavel[5769] kazan[5688] oleg[5596] yuri[5595] igor[5324] stanislaus[5299] federation[5248] alexey[5109] viktor[5068] bolsheviks[4999] leonid[4986] lithuania[4954] republic[4886] stalin[4885] vasily[4867] pyotr[4826] crimea[4793] duma[4771] romanov[4737] featherweight[4621] almaty[4603] kyrgyzstan[4521] kazakh[4472] anna[4363] medley[4305] flanker[4203] uzbekistan[4177] olga[4136] caucasus[3973] botswana[3968] purge[3891] imperial[3857] по[3840] putin[3738] turkmenistan[3635] ivanov[3623] novgorod[3586] ural[3449] anastasia[3407] siberian[3393] alexei[3208] flyweight[3108] doubles[3024] bantamweight[3015] poltava[3001] empire[2943] surname[2928] maxim[2927] ufa[2924] greek[2923] graduated[2911] georgi[2906] disbanded[2905] player[2892] siberia[2885] 
topic2=acacia[26697] suffused[18767] oblique[10993] fifths[10506] fourths[9659] ell[9292] estrogen[6430] certifications[6119] testosterone[5572] estradiol[5196] snep[4692] blackish[4586] fimi[4473] lh[3303] androgen[3157] ultratop[3129] umass[3044] aas[2972] nz[2849] ant[2830] anabolic[2662] steroid[2532] lista[2444] crib[2338] fabricius[2304] thi[2139] progesterone[2123] ifpi[2055] bpi[2033] vg[1878] giannis[1821] nirmal[1821] pinball[1813] nirmala[1755] hitparade[1694] stinging[1642] kelso[1619] estrogens[1591] suomen[1570] bình[1529] invicta[1528] saito[1517] artem[1497] anh[1487] bp[1457] occ[1394] transporter[1390] nh[1389] wallaroo[1360] sixths[1332] iosif[1322] alcorn[1319] petiole[1303] ethyl[1276] educationist[1258] tran[1254] scoreless[1251] entomologist[1248] paw[1243] grayish[1232] professorships[1180] oriya[1174] intermedia[1171] staudinger[1164] wallonia[1137] hasbro[1112] pce[1087] danang[1081] rasa[1061] bpm[1057] bombus[1046] alder[1034] platformer[1022] amer[1017] đồng[1002] subunit[988] lindner[980] ios[975] ngai[969] basheer[965] bindi[957] gorman[952] hòa[948] oud[940] setar[935] panjab[934] nettles[933] brunner[902] cheetahs[902] bathinda[900] dawley[891] neuro[887] ahr[885] steroids[882] parsecs[880] dimethyl[875] dur[874] sahni[873] falcón[872] ura[871] 
topic3=village[109355] van[97424] dutch[86857] district[81290] municipality[78818] census[68729] population[66663] netherlands[51807] administrative[43905] amsterdam[37228] belgian[33340] settlement[32358] town[28706] province[28269] antwerp[26004] governorate[25952] rural[24730] belgium[23748] villages[23039] region[21933] inhabitants[21374] urban[21286] municipalities[20903] municipal[20388] utrecht[19781] het[19129] km[18919] brussels[18885] community[18634] der[18300] geography[17924] canton[17439] ghent[16428] rotterdam[15092] reorganisation[15005] jan[14911] flemish[14900] jpg[14850] flanders[14518] seat[14282] localities[14236] den[14032] liège[14008] leuven[13419] settlements[13306] republic[13264] willem[13032] file[12861] zambia[12806] division[12731] hague[12532] groningen[12448] center[12404] towns[12052] according[11938] river[11890] craftsman[11795] districts[11787] northern[11585] en[11513] leiden[11433] openstreetmap[11282] pieter[11231] haarlem[11033] consists[10939] nl[10694] holland[10557] divisions[10547] cities[10216] sint[10013] frans[9621] centre[9394] created[9380] bureau[9330] brabant[9082] church[8803] okrug[8795] bruges[8710] situated[8688] demographics[8589] suriname[8569] capital[8311] een[8257] effect[8099] surinamese[8045] sdf[8013] mechelen[8000] nijmegen[7975] zambian[7888] nederland[7879] limburg[7875] jurisdiction[7788] land[7736] divided[7722] delft[7710] voor[7706] central[7479] border[7408] norway[7344] arti[7342] 
topic4=river[144700] lake[113130] park[85940] creek[80149] island[75102] mountain[74717] forest[64862] reserve[62474] water[52777] site[50826] conservation[47660] stream[46429] valley[40905] region[40730] flows[40716] tributary[37487] mountains[37008] land[36780] bay[36438] nature[34530] lighthouse[34487] wildlife[34463] village[34400] rivers[33915] sea[33073] km[31437] species[31426] natural[30547] lies[29973] northern[29535] district[29350] protected[28939] range[28891] areas[28859] basin[28198] mount[27906] locality[27849] western[27748] southern[26962] province[26934] birds[26082] cave[25641] coast[25198] islands[25112] trail[24699] trees[24513] hipped[24159] elevation[24137] australia[23805] situated[23769] hill[23518] eastern[23385] town[23381] meters[23323] road[23186] southwest[22865] northwest[22762] confluence[22391] dam[22130] peak[21994] fish[21661] municipality[21330] northeast[21243] beach[21191] lakes[21074] peninsula[21034] flora[20955] rock[20762] forests[20256] above[20159] location[19914] point[19840] summit[19748] southeast[19698] fishing[19559] reservoir[19511] fauna[19157] jpg[18836] archaeological[18566] approximately[18498] border[18305] andes[18256] hills[17980] mouth[17872] geography[17837] canyon[17780] route[17663] formation[17231] climate[17134] blooms[17048] vegetation[17008] level[16838] parks[16303] access[16235] population[16118] cattle[16090] woodland[15944] source[15750] height[15743] rocks[15725] 
topic5=zealand[44443] fa[27430] auckland[23845] manchester[21181] england[21153] london[20403] town[19447] wellington[17898] yorkshire[17373] leeds[17219] councillors[15904] sheffield[15658] liverpool[15104] christchurch[14995] lancashire[14533] canterbury[14082] borough[14068] ward[13976] bradford[13950] nottingham[13657] archdeacon[13336] wales[13281] leicester[13136] bristol[13019] cardiff[12311] birmingham[12167] hibernian[12056] wards[11719] halifax[10861] midlothian[10837] park[10572] ontario[10460] scotia[10200] midlands[9725] newcastle[9638] welsh[9357] nova[9323] hull[8834] bowls[8702] council[8461] oldham[8461] durham[8328] otago[8252] scorers[8251] hon[8014] newfoundland[7959] essex[7799] brighton[7590] educated[7461] coventry[7390] chelsea[7379] unionist[7304] curling[7250] alberta[6996] stoke[6928] sunderland[6889] redistribution[6884] plymouth[6864] aston[6796] dunedin[6771] lib[6758] kingston[6705] exeter[6693] huddersfield[6684] attendance[6603] salford[6514] peterborough[6475] swindon[6446] middlesbrough[6411] watford[6360] cambridge[6354] bolton[6320] barrow[6313] bucurești[6260] scorer[6260] johnstone[6153] ipswich[6129] cheshire[6079] ireland[6052] barnet[6045] vale[5989] preston[5941] prop[5911] charlton[5889] wolverhampton[5832] southend[5626] northern[5527] manitoba[5507] davies[5486] athletic[5475] kensington[5470] canadian[5418] oxford[5417] ham[5371] stockport[5357] canada[5355] wembley[5321] queensland[5298] score[5277] sutton[5216] 
topic6=orchestra[35771] opera[30905] composer[28564] piano[21415] symphony[19113] festival[16541] ballet[15497] theatre[15462] gymnastics[15316] conductor[14614] musical[13243] ensemble[12998] choir[12860] violin[12808] performed[11796] pianist[11778] dance[11451] soprano[10907] concert[10711] directed[10634] gymnast[10602] conservatory[10567] concerto[10255] cast[10094] quartet[9296] composition[9177] frau[9061] theater[9058] starring[8832] classical[8608] op[8607] philharmonic[8498] studied[8125] chamber[8068] vaudeville[7625] director[7610] bach[7564] singer[7434] composed[7383] telenovela[7377] prize[7369] composers[7184] gma[7140] yoo[7122] teatro[7120] cbn[7016] abs[6888] works[6756] competition[6739] cello[6736] violinist[6604] artistic[6597] bibliography[6591] organist[6574] maria[6515] rhythmic[6462] drama[6402] dancer[6329] concerts[6252] soloist[6192] string[6182] jazz[6062] concise[5914] libretto[5852] clarinet[5828] premiere[5705] flute[5678] performance[5640] italian[5627] viola[5607] choral[5483] act[5412] anna[5255] rmnz[5235] mozart[5139] cinema[5133] teacher[5116] solo[5111] performances[5092] sonata[5090] la[5068] compositions[5061] tenor[5038] conducted[5029] ehf[5003] elena[4982] screened[4848] orchestras[4834] voice[4775] orchestral[4736] singing[4682] di[4632] premiered[4591] piece[4575] beethoven[4508] folkloric[4503] acts[4500] comedy[4469] silent[4419] performing[4416] 
topic7=missouri[89320] wisconsin[67139] iowa[59342] community[58234] virginia[50291] carolina[48249] illinois[46772] unincorporated[46333] porch[45947] vermont[43797] ohio[42105] maine[40117] arkansas[38441] tennessee[37574] railroad[37546] oregon[36736] indiana[34821] texas[33894] office[33652] alabama[32494] italianate[32284] post[31448] mississippi[31417] washington[29996] georgia[29282] pennsylvania[28409] kentucky[27526] kansas[26582] florida[23025] creek[22903] louisiana[22693] michigan[22401] massachusetts[22141] nc[22074] township[21957] maryland[21784] district[21039] town[20622] dakota[20567] oklahoma[20531] established[20255] nebraska[19566] jersey[18595] chicago[18236] remained[17690] minnesota[17594] operation[17364] louisville[17253] elementary[16629] historic[16451] schools[16357] franklin[16264] california[16201] moved[16194] delaware[15531] portland[15413] utah[15250] colorado[14996] route[14978] springs[14731] jefferson[14595] cemetery[14551] river[14545] milwaukee[14238] sec[14093] madison[14071] connecticut[14020] nashville[13944] miles[13489] william[13316] fort[13311] sioux[13247] lake[13246] jackson[13224] richmond[13101] charleston[13067] arizona[13029] lincoln[12906] bays[12777] burlington[12403] hill[12385] baltimore[12348] farm[12217] montgomery[12182] hampshire[12072] counties[11942] register[11682] ozarks[11622] nevada[11563] ld[11247] wyoming[11205] salem[11059] rhode[11049] fbs[10974] center[10953] valley[10821] farmstead[10779] orleans[10764] grove[10741] monroe[10638] 
topic8=league[263636] cup[220871] club[218025] championships[181957] football[161841] goals[159024] apps[146890] round[133792] championship[125108] women[118141] tournament[117783] fc[111936] teams[109893] player[106082] rugby[100262] games[97071] match[96036] rank[87517] draw[85827] division[85120] plays[83401] olympics[82290] event[81582] men[81047] competition[79039] footballer[77007] medal[76548] competed[75988] matches[75155] professional[72169] debut[71098] finals[69955] profile[68986] stadium[68557] metres[66188] champions[64770] summer[64106] results[63938] points[62208] european[61932] squad[58958] bronze[58547] players[56702] junior[56506] score[55013] playing[54854] olympic[54736] premier[51783] youth[51510] athlete[51163] liga[50601] gold[49455] statistics[46849] athletics[46726] volleyball[45992] champion[45221] qualified[44915] sports[43907] win[43020] silver[42695] qualification[42439] scored[42194] indoor[42081] loan[41377] play[41314] competitions[41285] winner[40405] heat[39794] qualifying[38926] clubs[38828] nationality[38800] coach[38685] winners[38451] midfielder[38392] runner[37707] nd[37468] opponent[37340] goal[36729] side[36602] badminton[35684] senior[35359] semi[35164] seeds[34921] rd[34874] challenge[34836] result[34777] uefa[34595] finished[34304] relay[33733] table[33213] record[32957] game[32393] appearances[32378] represented[32304] super[32271] sport[32164] title[31349] half[31237] level[30897] signed[30558] 
topic9=album[361508] song[250314] chart[187669] band[164009] track[132615] vocals[118988] guitar[110313] label[100615] songs[97321] listing[96822] studio[92239] albums[91577] records[90983] charts[90979] you[90448] singles[81265] release[81136] bass[77378] singer[77132] video[75510] billboard[75083] recorded[72940] tracks[70983] ep[70211] jazz[69051] drums[67918] rock[64948] love[63662] me[62255] recording[58899] artist[58120] digital[56808] cd[56142] download[54913] peakposition[53856] personnel[53695] pop[51970] live[50545] my[50247] producer[50013] featuring[49763] debut[48787] discography[48209] songwriter[43493] piano[41466] hot[40221] performed[39497] tour[39486] record[39388] written[37762] lead[37412] us[35568] peak[35432] dj[34982] hop[34931] saxophone[34907] reception[34621] blues[34225] sound[34024] peaked[33650] format[33552] hip[33443] lyrics[33059] remix[33036] solo[33027] dance[33013] artists[32878] date[32599] production[32403] performance[31775] eurovision[31580] title[30821] radio[30676] musician[30435] your[30340] version[30125] produced[29797] youtube[29292] we[29176] percussion[29166] uk[28805] allmusic[28730] musical[28721] guitarist[28573] keyboards[27759] don[27603] aria[27485] musicians[27099] backing[26959] background[26771] featured[26715] cover[26286] recordings[26191] mixing[25613] hit[25584] termen[25279] reached[25219] rapper[24782] duo[24744] weekly[23316] 
topic10=philippines[28452] philippine[20831] manila[16754] filipino[16695] language[9832] tag[8157] wwe[7682] ng[7374] eaves[7166] yerevan[6121] ang[6017] languages[5988] davao[5939] sunil[5607] och[5131] deaf[5115] clapboard[5052] nwa[4958] lucha[4893] mindanao[4829] deepak[4793] smokehouse[4421] rizal[4205] enugu[4124] sa[4020] aquino[3904] luzon[3781] assamese[3741] spinnin[3712] dialect[3707] frescoed[3535] mahi[3499] feu[3460] fayard[3408] anambra[3360] ni[3142] spoken[3039] venu[2987] sveriges[2965] laguna[2961] corazón[2924] kya[2879] zamboanga[2868] dialects[2824] belles[2809] oaxaca[2801] ghar[2783] libre[2758] akshay[2740] njpw[2702] madhav[2650] sanam[2624] dictionary[2621] sab[2621] speakers[2611] för[2517] universel[2491] cuenca[2478] filipinos[2462] word[2460] metro[2453] ka[2440] na[2411] vowel[2400] arroyo[2375] abia[2371] gucci[2333] naga[2324] cagayan[2297] nisha[2273] researchgate[2265] occidental[2205] sta[2172] tawi[2160] anupam[2102] wcw[2095] más[2087] words[2060] names[2016] visayas[2014] marcos[1973] minori[1970] hombre[1944] moro[1929] ett[1893] mo[1889] phonology[1883] sur[1881] ahrar[1875] det[1847] smackdown[1841] wrestled[1839] piya[1824] cervantes[1807] heures[1804] fils[1773] chua[1771] uppsala[1765] cotabato[1754] jose[1738] 
topic11=al[90513] ali[31307] islamic[29672] pakistan[28932] iran[27595] khan[26711] iranian[23927] mosque[23089] arab[21416] ahmed[19995] mohammad[19483] ibn[19123] syria[17976] thai[17756] abu[17592] muhammad[17546] saudi[17159] iraq[16853] arabic[16492] muslim[15818] pakistani[15164] thailand[14804] islam[14777] bangkok[14755] egypt[13893] ahmad[13464] el[13382] abdul[13079] mohamed[12695] iraqi[12160] afghanistan[11857] sheikh[11689] egyptian[11666] persian[11621] bin[11101] hassan[10870] shah[10687] aleppo[10406] arabia[10209] abdullah[9898] mohammed[9752] kuwait[8719] cairo[8577] ibrahim[8553] yemen[8281] rahman[8137] raion[8013] dubai[7739] afghan[7635] syed[7622] emirates[7459] sudan[7269] nakhon[7111] hasan[6897] bahrain[6754] muslims[6663] mirza[6591] imam[6348] baghdad[6345] hussein[6323] jordan[6263] morocco[6255] ismail[6228] maccabi[5926] sidi[5778] amir[5768] oman[5754] reza[5687] moroccan[5481] islamabad[5477] taliban[5308] sharif[5299] abd[5244] libya[5225] malik[5205] khalid[5157] shia[5146] province[5144] ul[5144] damascus[5095] sultan[5062] omar[4899] karim[4646] rashid[4639] hamid[4607] algeria[4550] medina[4479] khalifa[4474] arabian[4340] kabul[4328] mahmoud[4320] khaled[4293] din[4197] amin[4195] ambassador[4127] lebanese[4106] minister[4069] lebanon[4042] tunisia[4024] dhabi[4014] 
topic12=station[187036] railway[171967] bangladesh[82225] train[39954] trains[34326] road[31680] rail[28715] dhaka[28236] bus[28011] metro[27462] stations[27297] opened[26982] express[26722] junction[26120] passenger[25978] km[25396] uganda[24995] services[23384] district[22797] depot[22301] airport[21027] transport[20602] railways[19291] vijayawada[18672] platform[17594] town[16795] bangladeshi[16680] route[16098] village[15227] transit[15146] closed[14560] gauge[14403] cultivators[14045] situated[13798] traffic[13619] lines[13440] operated[13272] platforms[12980] section[12593] townland[12534] passengers[12458] construction[12209] branch[11973] govt[11924] kolkata[11916] terminus[11701] bengal[11294] delhi[11245] class[11063] jaipur[10823] halt[10439] chittagong[10422] terminal[10377] freight[10366] track[10272] wales[10244] via[9966] buses[9787] tram[9691] central[9564] cambridgeshire[9442] hossain[9361] kampala[9242] division[9027] transportation[8939] tangail[8936] queensland[8665] india[8591] derbyshire[8503] bengali[8454] nearest[8430] street[8391] ugandan[8334] goods[8255] stop[8187] shaheed[8149] side[8090] upazila[8027] aged[7874] railroad[7787] location[7783] western[7779] tracks[7708] rapid[7579] saurashtra[7539] projecting[7534] curacy[7534] zone[7478] household[7465] routes[7358] tramways[7352] chowdhury[7313] howrah[7290] facilities[7193] coast[7177] southern[7130] eastern[7109] code[7077] bridge[7072] trams[7035] 
topic13=pcc[11934] subterminal[6771] palsy[6568] wrexham[6258] pls[4319] aif[3312] antibody[3135] pd[3030] mykolaiv[2966] burrell[2901] manish[2722] cardiff[2700] sclerosis[2541] axillary[2492] zeller[2371] drooping[2334] motte[2240] psl[2221] toxin[2172] merthyr[2136] vejle[2089] sogn[2065] monmouthshire[2046] rhondda[2018] carcinoma[1969] bot[1878] caerphilly[1862] bridgend[1802] carmichael[1766] taf[1760] pk[1744] distal[1735] monoclonal[1731] sajid[1711] nines[1708] melanoma[1696] dbu[1672] nci[1656] physiotherapy[1615] blum[1610] mdm[1557] dione[1540] cervical[1535] mutations[1531] lymphoma[1528] antibodies[1521] snr[1516] selectivity[1474] tumour[1465] llandaff[1418] thyroid[1411] nanoparticles[1404] lesions[1388] bcl[1368] glamorgan[1354] whitchurch[1338] cynon[1332] ortho[1331] pkr[1306] jacobson[1293] marrow[1288] castell[1285] sternberg[1275] vertebrae[1272] transcriptional[1265] cdt[1262] chemotherapy[1262] apoptosis[1257] chirk[1253] nrg[1238] gait[1233] holyhead[1204] sma[1199] siegel[1194] protease[1175] janssen[1172] nanomaterials[1171] kazuma[1139] epstein[1129] taff[1129] gwilym[1117] akt[1106] tecnico[1100] proximal[1098] dystrophy[1098] orpheum[1087] therapist[1085] genital[1081] epo[1076] tia[1073] idw[1050] ord[1044] hpv[1034] arbeiter[1033] prognosis[1012] parañaque[1008] humerus[1008] autoimmune[1005] insulin[1004] horner[1002] 
topic14=surname[20701] david[19433] michael[19124] james[17093] player[16746] paul[15780] robert[15362] george[15139] jack[14993] tom[14721] smith[14293] steve[13863] joe[13580] peter[13297] frank[13200] mark[13122] richard[12767] chris[12041] jim[11898] tackles[11871] politician[11632] mike[11316] scott[11261] ryan[11178] aggies[11108] williams[11060] taylor[11014] william[10890] bill[10802] bob[10345] jones[10341] martin[10272] lee[10254] kevin[10116] footballer[10112] harry[10093] davis[10020] allen[9822] brian[9802] barry[9757] halfback[9709] tony[9671] ben[9667] charles[9366] jr[9327] australian[9136] sam[9127] wilson[9057] gary[8964] directed[8853] andrew[8736] starring[8699] johnson[8542] fred[8369] canadian[8277] brown[8274] thomas[8158] alex[8135] billy[7902] ian[7864] mitchell[7824] matt[7693] jason[7627] tim[7620] jimmy[7593] alan[7522] pat[7501] brien[7473] kelly[7447] actor[7428] graham[7376] stephen[7317] lewis[7294] miller[7233] murphy[7216] van[7187] eddie[7182] daniel[7142] ray[7095] craig[7089] refer[6923] anderson[6893] moore[6886] nick[6855] jeff[6845] gordon[6767] eric[6735] dave[6596] howard[6559] anthony[6537] ross[6521] bruce[6515] linda[6466] matthew[6455] russell[6406] henry[6397] snooker[6389] patrick[6363] calli[6232] joseph[6232] 
topic15=hungarian[25236] hungary[16810] budapest[14378] kor[11210] eun[7707] mediacorp[6145] samsung[5732] magyar[4881] tc[4611] nemzeti[4602] istván[4342] faroese[3953] lászló[3950] koi[3766] ferenc[3711] ffu[3648] nagy[3300] nokia[3287] aac[2833] smartphone[2775] gábor[2675] sándor[2665] péter[2639] és[2523] encryption[2506] iot[2504] callsign[2424] ktv[2373] farkas[2273] usr[2232] yoshimoto[2224] szabó[2216] hu[2210] sidelight[2197] brickwork[2156] myx[2120] kento[2098] esperanto[2065] profesional[2044] se[2007] canoeist[1997] combinator[1989] zoltán[1979] afd[1966] lajos[1952] andrás[1928] szabolcs[1920] szeged[1919] militare[1879] zemplén[1873] arad[1860] vas[1838] yume[1811] tsubasa[1810] sia[1809] snapdragon[1797] huawei[1795] miklós[1771] bt[1771] qa[1767] ini[1764] ando[1686] wma[1685] tok[1685] győr[1683] tibor[1658] reg[1629] károly[1598] airtel[1537] bács[1526] dab[1522] tdt[1505] lexikon[1504] fujifilm[1459] ong[1458] ogura[1453] artforum[1435] sms[1394] erb[1392] pécs[1379] torun[1375] wearable[1373] cbr[1349] asp[1348] ege[1336] itu[1333] wifi[1322] ob[1307] messaging[1305] nsa[1305] kodak[1303] veszprém[1302] zhe[1301] voip[1291] mária[1286] lz[1281] eto[1275] thieme[1265] tr[1259] verizon[1259] 
topic16=vidhan[16251] damselfly[12497] csx[10474] mla[10186] ethiopia[6522] ethiopian[5162] sena[5101] kalyan[4929] shiv[4922] mandir[4507] branchlets[4407] breuning[4337] melaleuca[4113] addis[3970] dnq[3943] gables[3924] inmates[3707] thane[3319] ababa[3294] greensboro[3225] hrs[3046] vihar[3014] djibouti[3000] uab[2899] shelby[2848] wcc[2751] chakravarthy[2621] bahujan[2507] словарь[2506] gamecocks[2393] psychical[2351] modesto[2317] gauri[2261] bandra[2255] eucalypt[2245] palghar[2207] jayachandran[2110] liliana[2098] fayette[2093] roja[2045] kathi[2039] curran[2014] pfeiffer[2003] aparna[1990] nashik[1961] potts[1913] byard[1890] somali[1871] dusted[1857] sash[1813] knepper[1784] storekeeper[1755] anant[1710] bsp[1685] merrimack[1677] sawant[1676] chelyabinsk[1640] samford[1637] boardman[1607] tobin[1607] calhoun[1586] adama[1582] psd[1581] taft[1577] septa[1572] swp[1570] ashland[1568] bronson[1564] zootopia[1560] troup[1559] paas[1536] tana[1529] trenton[1519] sheva[1518] donati[1512] subiaco[1509] etv[1500] decatur[1496] spiritualist[1494] corcoran[1485] sarita[1475] milford[1470] dedham[1468] jaki[1462] igcse[1453] roxbury[1450] rosenwald[1429] yougov[1426] amal[1422] dieterle[1412] halfpipe[1412] carver[1392] nadya[1390] sion[1388] ossetia[1387] gunnarsson[1378] argento[1371] timi[1364] wilmington[1362] ncp[1357] 
topic17=wta[22884] mathematics[19980] mathematical[18654] theory[15908] martina[12298] equations[11340] geometry[11221] mathematician[11124] quantum[10372] equation[10254] graph[10104] function[9940] theorem[9247] differential[8559] algorithm[8528] ibadan[8515] problem[7394] algebra[7304] finite[7232] linear[7083] functions[6858] algebraic[6848] probability[6793] lucie[6597] space[6190] navratilova[6141] analysis[6099] shrestha[5983] algorithms[5853] method[5820] matrix[5807] dimensional[5764] computational[5708] numerical[5632] vector[5614] model[5299] graphs[5052] variables[5006] stubbs[4987] evert[4908] vertex[4726] topology[4581] solution[4574] methods[4558] nonlinear[4433] random[4397] value[4352] distribution[4330] hingis[4328] we[4326] physics[4320] given[4295] mathematicians[4236] partial[4228] optimization[4201] defined[4182] biju[4122] bolded[4115] constant[3980] obscurely[3976] example[3944] vertices[3910] dynamics[3853] mechanics[3847] sania[3833] point[3830] numbers[3799] problems[3781] variable[3685] values[3658] case[3556] discrete[3543] sequence[3529] polynomial[3510] sum[3501] complex[3493] plane[3484] jelena[3472] applications[3450] geometric[3435] approximation[3382] let[3361] metric[3332] fluid[3276] properties[3273] formula[3252] topological[3252] models[3215] cube[3207] petrova[3184] statistical[3136] triangle[3119] definition[3108] integer[3095] proof[3094] hyperbolic[3076] symmetry[3041] triangles[3038] destino[3029] field[3021] 
topic18=cricket[70887] cricketer[25748] matches[20404] puerto[20315] class[19353] wickets[17806] match[16789] venezuela[15988] campeonato[15007] rico[14417] innings[13909] runs[13602] batsman[13525] odi[13431] bowler[13413] wicket[11960] trinidad[11463] icc[11291] debut[11263] arm[11172] bowling[10778] right[10526] colombia[10297] tobago[10118] trophy[9932] cuba[9050] muisca[9012] rica[8974] costa[8634] cricketarchive[8589] overs[8499] handed[8115] rican[7947] scored[7937] twenty[7854] caribbean[7792] mirren[7678] kent[7570] indies[7422] ranji[7247] honduras[7244] test[7216] kilmarnock[7171] panama[7041] cuban[7038] caracas[7011] venezuelan[7004] sri[6832] batting[6760] ground[6622] espncricinfo[6605] partick[6577] clube[6541] changsha[6514] barbados[6181] que[6082] mcc[6022] dominican[5956] lanka[5949] warwickshire[5856] guyana[5742] uruguayan[5675] raith[5610] nicaragua[5533] cricketers[5523] gómez[5511] middlesex[5377] medium[5366] julio[5197] xi[5186] surrey[5160] rovers[5123] ecuador[5067] sussex[5065] colegio[4853] highest[4805] uruguay[4732] bowled[4678] balls[4600] futebol[4569] nottinghamshire[4539] scorer[4337] leicestershire[4325] vida[4318] guatemala[4254] domestic[4201] herrera[4196] honduran[4071] jamaica[4011] joaquín[3955] pakistan[3931] liberia[3751] fast[3713] mendoza[3650] glamorgan[3606] campos[3602] rivas[3522] guadalajara[3515] havana[3489] zimbabwe[3468] 
topic19=communes[17975] brewery[13119] lyon[12060] beer[11272] france[10976] french[8692] saint[8391] toulouse[8128] senegal[7447] faso[7334] burkina[7199] commune[6923] chargé[6803] affaires[6650] department[6298] benin[5931] brewing[5898] marseille[5866] michelin[5823] loire[5758] senegalese[5614] baku[5591] metz[5380] dsq[5101] dakar[5032] rouen[5018] haute[4886] lettres[4601] vie[4553] château[4362] calais[4288] podiums[4160] grenoble[4146] tarn[4120] havre[4050] littérature[3972] autres[3915] bordeaux[3893] maison[3718] aix[3532] ale[3522] beers[3471] le[3435] digitisation[3398] caen[3298] étienne[3276] niger[3275] chn[3272] techcrunch[3264] arrondissement[3221] cfa[3186] perpignan[3121] la[3018] arras[3000] ajaccio[2971] breweries[2961] chef[2959] margined[2911] terre[2859] troyes[2818] québec[2791] mali[2772] gaston[2709] abidjan[2700] sur[2666] derulo[2637] seigneur[2625] aliyev[2602] brewer[2600] chapelle[2561] ind[2479] auxerre[2473] guerre[2459] beatport[2430] du[2419] reims[2355] et[2333] en[2311] poulsen[2306] nigerien[2301] autódromo[2275] mauritania[2251] vieux[2238] guadeloupe[2237] bhr[2227] jenn[2223] dieu[2205] loup[2159] collège[2148] nord[2148] xixe[2147] département[2126] fih[2117] fournier[2115] flávio[2075] porte[2073] redlands[2056] seine[2049] clément[2021] pape[2002] 
topic20=bridge[81869] highway[51163] road[48105] route[44978] slate[21135] bays[19114] odonata[18852] bridges[15837] intersection[15582] gastropods[13944] farmhouse[13383] terminus[12993] arched[12851] crosses[12836] river[12707] tunnel[12539] curves[11325] truss[11240] sr[11008] us[10609] border[10163] junction[10083] traffic[9801] expressway[9762] intersects[9092] whorls[9050] creek[9016] roofed[8883] span[8882] vermont[8873] crossing[8738] intersections[8593] connects[8569] northeast[8519] runs[8470] travels[8467] roads[8388] begins[8290] highways[8208] interchange[7978] continues[7808] sc[7653] wfc[7244] description[7082] kentucky[6930] northwest[6882] street[6792] outbuildings[6730] avenue[6590] homestead[6586] sh[6449] motorway[6351] spans[6313] passes[6059] octagonal[6035] lane[5832] length[5814] toll[5746] arch[5618] northern[5475] roadway[5473] enters[5210] footpath[5122] raipur[5026] roofs[4995] southeast[4956] section[4909] ends[4868] meadows[4855] geograph[4723] adac[4604] lanes[4576] cottages[4563] eastern[4505] devonian[4428] carries[4427] steeply[4304] rural[4300] interstate[4217] construction[4126] km[4085] transportation[4084] southern[4079] rfu[4067] sills[3996] connecting[3987] paleontology[3924] concrete[3854] crossings[3802] bypass[3798] cambrian[3787] parkway[3766] deck[3763] covered[3710] junctions[3700] stratigraphy[3695] quarried[3654] ordovician[3636] ammonites[3609] segment[3564] 
topic21=soo[11930] hee[10040] idaho[10022] yoon[9916] jae[9198] dong[8736] kang[7681] kyung[7475] joo[7268] namibia[6703] africa[6399] seung[6155] jeong[5723] boise[5592] wac[5543] natal[4922] african[4544] pretoria[4390] kwazulu[4226] cameroun[4148] cape[4008] sang[3945] ahn[3921] hwan[3871] namibian[3686] hae[3534] tubercles[3508] hyo[3402] rogério[3391] spokane[3354] ju[3289] lesotho[3179] baek[3097] transvaal[2979] agarwal[2916] bae[2912] chae[2813] ting[2758] cassa[2733] sik[2694] apartheid[2667] comers[2647] cho[2565] gu[2522] grenada[2519] sook[2443] risparmio[2380] sarsfield[2342] uddin[2287] watanabe[2234] lê[2176] anura[2152] malian[2132] rockingham[2065] kwang[2054] telenovelas[2051] matti[2040] divya[1972] gi[1946] regionale[1941] stellenbosch[1933] за[1921] miho[1905] grenadian[1887] jeeva[1870] afrikaans[1854] everard[1848] anc[1783] aya[1781] subramaniam[1772] gyu[1767] carnarvon[1763] cardona[1749] kyun[1730] maki[1705] whitman[1677] stapleton[1676] fondazione[1670] amphibia[1645] momo[1614] reykjavik[1601] diop[1594] tremblay[1581] custer[1568] hailey[1563] hsien[1553] mateus[1548] walla[1530] eparch[1526] jeremih[1517] saa[1453] chieti[1450] keough[1436] dimitar[1421] bloemfontein[1402] grenadines[1399] yun[1393] dal[1390] président[1387] muller[1382] 
topic22=czech[39417] prague[22089] fivb[19394] slovak[13613] steeplechase[9901] susheela[9603] bratislava[8232] hc[8147] republic[7745] slovakia[7266] czechoslovakia[7142] czechoslovak[7131] blocker[7112] bhosle[6804] brno[6514] iihf[6327] sv[5948] yesudas[5609] jiří[5353] arun[5225] soundararajan[5212] petr[4895] václav[4784] dq[4670] rajan[4561] sangeet[4520] ghantasala[4500] mani[4338] ostrava[4169] cz[4154] kk[3985] wr[3975] josef[3956] bohemia[3916] iyer[3832] miloš[3808] praha[3792] karel[3621] maharaj[3512] tochter[3417] biswas[3399] jan[3365] srinivasan[3357] suman[3346] bhojpuri[3316] františek[3183] maa[3172] jana[3052] pavel[2892] vladimír[2887] tomáš[2881] slavia[2833] naresh[2827] andrej[2784] mahadev[2759] zeman[2746] moravia[2732] sk[2715] jaroslav[2688] mfk[2643] aravind[2641] michal[2616] jozef[2609] sparta[2591] antonín[2559] ilaiyaraaja[2559] nad[2538] shaan[2482] miroslav[2474] balakrishna[2447] ttt[2272] plzeň[2256] bohemian[2241] olomouc[2207] kunal[2197] ján[2148] grambling[2136] vestnik[2117] škoda[2098] anagennisi[2074] zdeněk[2054] liberec[2033] galindo[2016] pallavi[2007] krishnamurthy[1999] garg[1961] raga[1916] usha[1859] regionals[1850] ladislav[1838] sonu[1837] tabla[1823] jayaraman[1823] pradhan[1811] ashwath[1789] mulher[1786] jakub[1765] tatran[1760] jawahar[1753] federación[1738] 
topic23=administrated[25161] power[21522] locomotives[18333] storeys[17026] plant[15320] locomotive[15079] cornice[14926] class[13844] mine[13587] coal[13500] mw[12841] dam[11890] gas[10465] obliquely[10410] mining[10378] steam[9735] fremantle[9610] electric[9046] energy[8999] capacity[8502] diesel[8337] electricity[8170] hydroelectric[7501] malta[7468] solar[7368] mines[7089] cars[6783] oil[6654] kw[6479] railways[6440] station[6066] maltese[6004] steel[5801] railway[5757] engine[5705] wind[5515] nuclear[5340] car[5184] iron[5114] subcostal[4836] construction[4822] ore[4779] busan[4573] production[4456] vr[4376] tender[4146] gwangju[4126] reservoir[4096] dockyard[4091] turbine[4069] colliery[4029] engines[4011] water[4003] project[3948] tenders[3668] operated[3635] tons[3520] type[3417] rabbitohs[3405] furnace[3373] turbines[3311] quickie[3287] tenths[3259] boiler[3258] traction[3248] fuel[3171] foundry[3128] storage[3049] storefront[3041] miners[3029] transversely[3021] renumbered[3000] tonnes[2949] factory[2948] gozo[2918] generation[2903] bracketed[2900] tank[2898] installed[2861] kv[2856] dfl[2815] renewable[2796] units[2763] fuzhou[2760] owned[2724] grid[2722] fluted[2712] snapchat[2692] copper[2689] mill[2676] valletta[2660] semicircular[2640] petroleum[2631] shaft[2618] delivered[2610] wheel[2605] wheels[2574] supply[2535] stations[2471] facility[2462] 
topic24=game[92200] software[56005] data[39318] app[37672] users[32711] mobile[30274] video[26346] games[25900] user[25583] computer[25037] android[23089] system[22762] player[22091] platform[21675] microsoft[20663] web[20518] code[19984] developed[19919] google[19621] windows[19399] technology[19255] digital[18659] online[18543] systems[18467] players[17161] cloud[16571] content[16469] application[16324] internet[15098] features[14897] available[14828] version[14605] devices[14470] using[14464] development[14431] startup[14063] information[13345] open[13127] virtual[13112] design[13060] allows[12975] network[12855] gaming[12618] product[12379] source[12311] project[12289] device[12216] ibm[11896] access[11812] interactive[11623] linux[11611] applications[11551] developers[11253] server[11185] card[11161] pc[10865] developer[10722] tools[10636] interface[10298] phone[10269] mode[10258] camera[10224] kickstarter[10093] com[9914] apple[9883] security[9826] computing[9746] os[9715] free[9705] cards[9688] announced[9654] page[9627] graphics[9560] hardware[9515] launched[9419] release[9413] file[9381] xbox[9332] products[9133] model[9080] designed[9061] create[9044] support[9027] http[8836] machine[8733] search[8653] programming[8604] uses[8533] control[8428] provides[8352] different[8212] database[8147] management[8125] files[8046] platforms[8022] memory[7969] technologies[7915] tool[7889] storage[7856] electronic[7810] 
topic25=football[168081] basketball[130011] coach[119979] conference[103829] ncaa[103399] league[83100] tournament[79026] nfl[75899] games[67297] head[64610] game[61909] yards[60557] record[60153] baseball[57477] division[56715] player[56233] schedule[53107] draft[50894] opponents[46909] represented[45635] finished[45572] soccer[44432] michigan[43819] outscored[43125] stadium[43102] overall[40673] professional[39639] round[39500] regular[38094] big[37131] players[36514] championship[35566] athletic[34236] men[33883] play[33235] texas[33105] teams[32709] women[32124] roster[31482] bowl[29882] california[29635] signed[29311] san[28450] florida[27678] tigers[27384] defensive[26615] arena[26332] compiled[26119] arizona[26052] led[25717] field[25675] points[25571] mac[24176] senior[24033] junior[23842] selected[23249] bio[22776] carolina[22588] hockey[22445] drafted[22315] standings[21219] rushing[21166] coaching[20760] pm[20725] nba[20101] cal[19849] sports[19802] playoffs[19793] seasons[19673] wins[19576] fiba[19400] eagles[19277] center[18727] week[18637] miami[18571] ohio[18449] vs[18433] georgia[18415] sophomore[18202] attended[18049] tackle[17990] guard[17912] chicago[17912] losses[17886] indiana[17777] washington[17775] missouri[17702] tie[17657] win[16981] illinois[16910] bracket[16848] finish[16704] minnesota[16690] association[16048] kansas[15981] ten[15802] lost[15738] coaches[15738] diego[15485] broncos[15482] 
topic26=costal[33443] mollusk[27646] protein[27158] cancer[24854] disease[23213] cell[21830] treatment[20435] gene[20141] drug[19786] cells[18736] clinical[18618] patients[18287] medical[15996] virus[14914] health[14070] brain[13899] proteins[13141] dna[13061] diseases[12475] human[11691] receptor[11244] genetic[10699] blood[10497] bacteria[10478] patient[10405] effects[10392] genes[10201] paralympics[10161] marijuana[9937] genome[9770] syndrome[9732] subtotal[9479] disability[9440] rna[9369] drugs[9356] humans[9335] tissue[9292] medicine[8943] research[8745] therapy[8697] symptoms[8597] bacterial[8540] activity[8438] molecular[8421] function[8343] infection[8324] gastropod[8120] disorders[8035] type[7917] surgery[7887] cerebral[7685] skin[7434] indexed[7407] associated[7295] study[7138] pain[7133] strain[7120] abstracted[7089] animal[6954] disorder[6904] expression[6864] non[6681] tumor[6618] acid[6536] impairment[6535] breast[6341] related[6327] development[6253] viral[6175] hiv[6123] muscle[6105] diagnosis[6100] bone[6052] trials[5994] binding[5957] isolated[5955] mollusc[5934] amino[5873] cause[5866] transcription[5854] risk[5850] animals[5834] growth[5831] liver[5828] encoded[5788] membrane[5774] studies[5742] biological[5692] viruses[5686] slug[5665] specific[5664] receptors[5542] phase[5516] inhibitor[5476] host[5448] vaccine[5440] lung[5304] classification[5265] bac[5252] species[5248] 
topic27=anime[16382] japanese[13782] oricon[13299] manga[12823] japan[10491] jammu[10016] kashmir[9667] nakodar[9023] ntv[8552] tba[7103] ni[6954] nhk[6796] theme[6105] asahi[6018] tokyo[5720] anjali[3907] mato[3671] akita[3499] kottayam[3336] kannur[3322] ga[3285] volumes[3237] sakura[3188] khalsa[3115] ultraman[3102] priya[3093] idol[3083] tv[3043] mello[3026] ending[2961] yokohama[2939] uta[2898] ai[2898] kimi[2897] suzuki[2849] nana[2834] siddharth[2823] cerrado[2793] ishq[2723] shōnen[2676] suraj[2617] takahashi[2539] minami[2480] odia[2461] ishikawa[2435] gilgit[2431] sagar[2379] viswanathan[2332] kita[2308] ame[2286] sundaram[2261] avex[2144] giri[2143] seema[2134] volume[2129] várzea[2110] yamamoto[2108] kashmiri[2101] animation[2082] tocantins[2050] mata[2020] deen[2016] kishan[1998] raza[1998] amparo[1930] manaus[1893] tabi[1848] não[1839] ghulam[1836] atsushi[1799] ondo[1795] illustrated[1795] kobe[1780] storyboard[1768] hiroki[1758] tankōbon[1751] kubo[1745] rondônia[1731] cba[1727] dhillon[1718] azad[1716] weekly[1708] amala[1703] komatsu[1699] professionnelle[1698] serialized[1683] satyam[1673] dub[1647] puccini[1645] multan[1629] adaptation[1592] nami[1542] diya[1537] kumi[1526] toei[1525] aaj[1504] trax[1502] lovely[1492] uday[1478] shueisha[1451] 
topic28=korean[35847] kim[30051] korea[29837] lee[14779] jung[14661] skating[14240] min[12254] jin[11575] hyun[10194] isil[9998] ji[9854] seoul[9568] choi[8645] woo[8191] cha[7692] seo[7518] han[7183] champ[7146] hangul[7079] skate[7043] tcr[6875] tae[6806] ho[6804] sung[6786] jang[6767] tehran[6548] yeon[6521] skater[6485] zh[6296] jong[6224] ri[6025] park[6014] yong[5749] isu[5572] joon[5053] shin[4997] joseon[4669] hanja[4664] young[4536] figure[4510] dns[4338] ye[4329] mi[4156] iran[4082] incheon[3969] ktm[3962] mrt[3765] konitz[3660] bab[3655] nam[3650] oh[3619] hwa[3537] il[3496] jo[3399] abbas[3363] tabriz[3329] assyrian[3224] medalist[3188] hong[3174] ki[3133] prix[3105] seong[3090] lrt[3071] na[3039] dae[3020] jun[2967] yeong[2903] acb[2896] ara[2863] su[2801] olímpico[2782] yi[2705] yang[2614] sun[2611] seon[2501] hao[2499] se[2430] ae[2293] fs[2255] pyongyang[2240] wang[2230] chang[2224] ra[2201] ro[2136] geun[2115] fam[2107] chong[2087] jilin[2062] baloncesto[1975] pang[1975] saff[1937] roh[1930] raqqa[1915] nat[1914] bala[1868] yanbian[1848] bahá[1830] irib[1818] province[1777] shiraz[1765] 
topic29=wildcards[10346] efl[8624] ipc[7423] gambia[6841] ecac[6205] bitcoin[5972] fcs[5506] thani[5111] torneo[4644] gambian[4098] colgate[3635] pf[3454] beira[2869] carex[2780] apache[2684] jiangxi[2670] mozambique[2628] js[2609] sedge[2475] maputo[2459] villes[2417] utsa[2363] selfie[2338] batten[2233] ziyang[2168] taka[2033] iphone[2029] jur[1931] jax[1897] mussel[1879] nt[1870] yeo[1826] swaziland[1813] jaro[1790] pattaya[1750] swazi[1745] mozambican[1742] mady[1728] ipad[1675] dp[1653] marques[1647] brazzaville[1638] argentino[1547] unimproved[1525] nla[1512] wp[1501] password[1484] ange[1454] baruah[1453] kora[1452] harpe[1435] bourne[1430] bangui[1430] io[1415] noticias[1398] authentication[1385] malawian[1382] friedl[1382] bluetooth[1381] lv[1372] toomey[1353] suriya[1334] audiovisual[1320] botoșani[1313] bom[1277] mamadou[1270] cutler[1259] trina[1256] redshirting[1249] torrent[1235] halliday[1227] aiaw[1226] mccloud[1222] foss[1220] php[1204] tunde[1202] nanchang[1183] comix[1176] pmpc[1163] css[1162] lamin[1155] navale[1153] longerons[1150] noire[1145] apps[1145] folder[1144] hoyas[1134] mazur[1114] gnat[1100] zeke[1099] handa[1096] hacker[1086] ssl[1079] arends[1070] ocr[1067] mirosław[1055] passwords[1055] hendra[1052] engström[1050] backend[1048] 
topic30=historic[193472] building[179428] register[113178] places[90102] listed[88132] buildings[79081] street[75166] brick[69329] roof[67461] story[62903] tower[52445] style[51431] hotel[49037] architecture[48015] stone[46478] district[46319] revival[44738] hall[40948] church[40597] architect[40090] windows[39147] designed[38031] contributing[37870] floor[36623] construction[35864] park[35037] structure[34527] property[34467] jpg[34109] gable[33391] frame[32457] square[31410] site[31288] facade[30480] front[30413] entrance[29545] side[29462] houses[28506] constructed[27928] architectural[27149] center[26613] design[26172] listings[25793] town[25776] added[25669] features[25651] library[25337] museum[25245] heritage[25105] file[24933] residential[24332] monument[24328] interior[23329] barn[23091] walls[23080] road[22749] room[22435] opened[22346] dwelling[21879] avenue[21874] queensland[21736] bay[21618] castle[21502] central[21111] memorial[20558] arkansas[20480] grade[20276] complex[20266] courthouse[20261] commercial[20261] window[20093] corner[20039] mill[20002] wall[19997] restaurant[19843] wood[19638] farm[19548] rooms[19356] residence[19224] store[19116] rectangular[18970] rear[18560] architects[18369] block[18136] demolished[17979] concrete[17959] centre[17855] office[17202] timber[17167] originally[16922] feet[16856] completed[16784] columns[16679] limestone[16569] plan[16388] garden[16332] location[16292] indiana[15871] cemetery[15863] floors[15857] 
topic31=depressariidae[18966] gelechiidae[18681] tornus[16393] blooms[14372] spots[13740] transverse[11411] margin[11176] dot[10092] markings[8052] elegans[7697] turrids[7547] lecithoceridae[7223] mohan[7141] queensland[5973] phagwara[5929] rajkumar[5473] anal[5347] subspecies[5228] xyloryctidae[5205] turridae[5192] weatherboard[4475] faint[4401] radha[4392] dull[4199] uchicago[4103] botany[3967] lepidoptera[3943] suture[3734] animalia[3702] durga[3701] streaks[3638] moths[3565] surya[3556] chitra[3465] modi[3428] rounded[3417] zoology[3386] guiana[3231] cand[3210] puri[3015] autostichidae[2988] kimberley[2954] tick[2929] sangeetha[2858] undefined[2777] upanishad[2738] dichomeris[2729] crab[2723] guntur[2699] mycologist[2696] sarma[2683] gujarat[2631] edu[2626] hanuman[2615] elina[2588] puillandre[2563] mushroom[2560] bissau[2540] sinuate[2526] rosy[2500] entomology[2486] thakur[2483] kanchana[2449] gecko[2396] nee[2373] tanjong[2365] drosophila[2317] meenakshi[2235] lip[2219] replication[2193] vasu[2172] attenuated[2157] purple[2114] girish[2082] attains[2061] genomic[2039] sastry[2007] tentacles[1990] mohanty[1986] upendra[1942] shuai[1906] raja[1902] prabha[1900] butterfly[1879] karna[1862] drepanidae[1852] mathura[1835] parasitology[1833] buhari[1823] ramachandran[1818] veera[1809] uma[1787] chakra[1773] ajit[1758] genomes[1738] venkateswara[1717] rajputs[1716] indiewire[1696] radhakrishnan[1693] kutch[1692] 
topic32=ludhiana[27629] congo[6766] ipsc[5463] ellington[5445] braxton[5417] motherwell[5190] congolese[4947] wcha[4442] roach[4309] mehldau[3485] lehigh[3285] gillespie[3269] dizzy[3139] thelonious[3039] getz[3035] mcbride[2802] mcintosh[2749] cohn[2699] bop[2652] niu[2651] brubeck[2642] verve[2548] jobim[2537] sims[2525] hons[2489] adderley[2488] rensselaer[2458] campground[2458] wabash[2441] hodges[2366] rollins[2360] binghamton[2353] kalev[2327] susquehanna[2274] kinshasa[2251] goswami[2250] rcd[2220] pinal[2202] juvenil[2155] drc[2055] smt[2036] kohler[2023] deshpande[2022] operetta[2018] trombonist[2014] vibraphonist[1982] schuyler[1973] utica[1933] bennington[1915] bachelors[1894] malone[1875] rutland[1866] zoot[1857] erie[1857] eldridge[1851] sandnes[1779] lacy[1764] fisk[1743] giuffre[1741] humphreys[1739] blanchard[1729] mlc[1706] ucl[1704] horvath[1698] macklemore[1695] erling[1684] mance[1679] nao[1675] rah[1667] jeunes[1666] hurley[1664] adirondack[1640] banaras[1627] bley[1617] lackawanna[1599] crouch[1581] fairleigh[1566] dewey[1551] mulligan[1545] highschool[1544] asu[1541] diliman[1521] suny[1494] acha[1488] snapper[1487] tamar[1480] lipscomb[1477] zorn[1472] haynes[1453] dutton[1450] shim[1447] lombardo[1447] ruff[1446] scranton[1437] bhagwan[1434] monk[1430] swarthmore[1421] laine[1407] sayre[1406] cheatham[1403] 
topic33=ky[21829] segunda[16119] alaska[13736] whorl[9457] oblast[8538] lsu[6648] deanery[6135] parishad[5905] bardhaman[5176] pct[4973] apa[4624] porches[4565] selo[4450] potomac[4144] yukon[4066] transom[3943] macon[3584] purulia[3258] anchorage[3222] gabon[3144] gana[3085] banga[3071] littéraire[2998] krai[2983] roanoke[2935] bolivar[2911] confluent[2881] bankura[2799] trinamool[2766] fredericksburg[2559] fairbanks[2440] taney[2405] appomattox[2333] midshipmen[2313] bethel[2210] chattanooga[2165] vassar[2146] swanson[2119] máximo[2081] diocesan[2025] juneau[2001] burdwan[1981] piney[1932] boonville[1888] danville[1883] eritrea[1871] pmc[1858] natchez[1834] greenfield[1808] meade[1800] charlottesville[1798] archdeaconry[1767] positio[1685] sampson[1648] cuny[1628] hardt[1626] nome[1619] sáenz[1617] mckinsey[1613] eritrean[1600] photojournalist[1563] anam[1554] manassas[1538] becca[1518] antietam[1515] tiller[1509] flor[1505] residentiary[1479] sarasota[1474] kamchatka[1453] asmara[1449] tyumen[1442] territorial[1434] wheeling[1427] ibarra[1425] scc[1420] rajeswari[1414] mim[1407] linares[1396] kirti[1393] everglades[1390] garo[1387] hooker[1385] faridpur[1383] parganas[1381] haley[1367] valdez[1345] odell[1341] hopson[1340] mccallum[1336] juventud[1331] bashkortostan[1329] haines[1329] stedman[1328] sigman[1322] townsite[1311] esta[1302] ase[1283] yancey[1281] mikhailovich[1279] 
topic34=são[30792] brazilian[28272] brazil[26472] da[23147] portuguese[21663] paulo[19983] rio[17515] janeiro[16598] do[14436] verde[14094] portugal[13943] silva[12169] praia[11930] cape[10281] porto[9975] paulista[9747] joão[9311] dos[8387] santos[8246] josé[6865] grande[6854] brasil[6452] santo[6088] vitória[6037] lisbon[5796] antónio[5779] pereira[5651] amazonas[5633] oliveira[5531] pedro[5384] verdean[5197] serra[5050] ribeira[5050] ferreira[5009] fogo[5006] sul[4899] carlos[4751] paraná[4710] souza[4601] bahia[4568] maria[4482] gomes[4425] das[4405] vicente[4267] novo[4229] pará[4080] luiz[4072] rodrigues[4056] minas[4052] almeida[4045] martins[3975] vila[3948] santa[3916] mendes[3836] luís[3830] santiago[3794] ponta[3760] antão[3720] lopes[3666] dias[3650] amazon[3540] island[3536] guimarães[3534] mindelo[3473] boa[3463] madeira[3462] ramos[3421] filipe[3397] gerais[3391] vasco[3327] jorge[3325] fernando[3299] garcia[3284] costa[3267] os[3205] sal[3187] maio[3163] brasileiro[3069] globo[3051] carvalho[3028] nicolau[3011] catarina[2993] goa[2947] joaquim[2935] augusto[2921] henrique[2900] quito[2893] manuel[2891] botafogo[2875] sousa[2864] andrade[2829] cardoso[2823] rocha[2780] antônio[2773] cruz[2747] sawan[2733] fonseca[2730] brava[2698] alegre[2684] monteiro[2673] 
topic35=german[84172] der[60185] von[50600] und[47759] berlin[37595] germany[35106] die[30779] hans[24378] für[18196] munich[17907] hamburg[17775] karl[17350] austrian[17069] vienna[15894] des[15042] im[14835] friedrich[14703] leipzig[14059] das[13599] johann[13314] wilhelm[13097] heinrich[12632] bundesliga[12613] austria[12586] franz[12337] zur[11947] georg[11765] frankfurt[11725] verlag[11711] ernst[11659] hermann[11196] fritz[11108] ein[10913] ludwig[10645] deutsche[10470] otto[10185] isbn[9480] rudolf[9401] stuttgart[9059] bavaria[8413] baden[8380] werner[8373] wien[8284] geschichte[8155] carl[8090] cologne[8081] wolfgang[7677] rhine[7604] aus[7585] bonn[7571] zu[7306] saxony[7149] nazi[7128] max[7057] auf[7041] erich[6807] heinz[6711] dem[6688] walter[6657] den[6607] josef[6540] bremen[6539] gustav[6514] prussian[6484] württemberg[6436] huber[6361] mit[6315] düsseldorf[6305] müller[6279] eine[6228] weimar[6117] johannes[6082] münchen[5845] heidelberg[5771] bavarian[5667] kurt[5575] am[5530] bahn[5490] klaus[5483] mainz[5445] zum[5422] swiss[5387] deutschen[5295] graz[5236] jena[5235] spd[5213] dfb[5183] bei[5115] zürich[5087] theodor[5049] richter[4960] münster[4950] adolf[4914] brandenburg[4892] gerhard[4887] christoph[4797] paul[4736] fischer[4617] schmidt[4584] rhineland[4560] 
topic36=stakes[22960] jalandhar[20788] barcelona[15972] horse[15099] kenya[8786] kenyan[8683] amritsar[8654] race[8209] racing[7840] lengths[7382] catalan[7077] catalonia[6824] derby[6576] horses[6528] handicap[6371] filly[6301] colt[6248] jockey[6147] trainer[6020] stud[5769] races[5603] dressage[4930] stallion[4688] winner[4479] nairobi[4447] trained[4320] bred[4249] ridden[4186] triathlon[4065] stable[3992] mile[3949] runners[3928] equestrian[3809] thoroughbred[3805] win[3613] breeders[3611] breeding[3609] ayr[3498] sire[3386] prix[3380] keelboat[3357] sailboat[3241] andorra[3205] mallorca[3179] eventing[3172] epsom[3023] girona[2869] ernakulam[2813] fillies[2811] broodmare[2807] oaks[2801] run[2775] mare[2680] inflorescences[2666] kentucky[2640] hnl[2638] winners[2628] bhattacharya[2615] distance[2601] pedigree[2542] turf[2539] ironman[2492] catalunya[2431] reus[2310] sant[2252] harness[2219] josep[2158] foals[2136] tarragona[2120] rider[2084] mares[2070] farm[2064] dam[2056] balearic[1990] flat[1948] stables[1938] grizzlies[1927] churchill[1879] miquel[1858] gakuen[1828] pounds[1816] lleida[1815] grade[1798] jaume[1797] cup[1785] reina[1775] coloma[1761] gelding[1759] weld[1742] farrington[1741] complutense[1741] francesc[1729] bay[1727] maiden[1713] belmont[1704] park[1680] carruthers[1669] winning[1667] sabadell[1656] pace[1636] 
topic37=pb[13434] sb[13170] trump[10652] tunisian[10576] mustangs[10395] rook[10320] rider[8933] cyclist[8862] tunisia[7723] tunis[7608] nas[6726] vuelta[6583] caf[6415] pot[5088] darts[4954] bike[4877] bicycle[4828] fb[4675] poker[4496] mustang[4487] tn[3641] runway[3311] pekan[3117] cyclists[2975] doping[2797] abeokuta[2499] allentown[2445] amish[2434] cycling[2401] kao[2236] sdn[2185] sneha[2114] bmx[2104] finley[2091] puteri[2090] drone[2047] bicycles[2028] casino[2007] bikes[2002] inactivated[1966] bf[1885] itt[1862] slc[1832] yasir[1824] dart[1812] bhd[1794] nav[1769] kf[1763] jg[1734] psm[1712] sprinters[1682] kh[1677] sse[1670] criterium[1640] meritorious[1617] redesignated[1605] cycliste[1599] motocross[1546] cr[1543] fargo[1541] vb[1533] tt[1517] aoa[1515] akmal[1507] awang[1478] elgin[1474] estero[1469] rc[1464] mennonite[1447] neuwied[1439] tamworth[1437] nv[1416] sukhoi[1416] dunlop[1412] curtiss[1411] leighton[1386] bhavana[1361] nb[1328] aces[1318] fas[1316] cx[1305] fédération[1296] mosman[1296] saiful[1291] minesweepers[1289] fp[1288] ati[1262] oa[1252] fokker[1242] mcconnell[1229] cst[1217] bayonne[1216] koe[1211] anak[1200] uavs[1193] asphalt[1181] rr[1181] atv[1175] iata[1167] dh[1165] 
topic38=research[66060] education[65640] science[57921] institute[57702] professor[56700] students[52411] medical[49202] engineering[41422] hospital[38642] degree[37801] sciences[37695] award[36374] society[35746] academy[34619] medicine[31914] technology[30589] department[30450] health[30265] faculty[29353] fellow[29137] director[27706] president[26143] physics[25070] studies[24831] dr[23844] schools[23826] bachelor[23393] awarded[23144] association[22712] academic[22641] phd[20981] secondary[20952] indian[19839] women[19779] campus[19737] board[19691] india[19101] chemistry[19072] master[18978] worked[18933] program[18815] awards[18735] teaching[18308] graduate[18277] courses[18133] scientific[18030] center[17551] graduated[17412] student[17315] educational[16940] arts[16818] council[16160] foundation[16112] mathematics[15962] chair[15729] training[15628] laboratory[15425] contributions[15016] study[15005] computer[14526] ph[14491] founded[14480] prize[14461] management[14224] vice[14122] established[14103] teachers[13652] institution[13455] biology[13442] studied[13399] development[13181] doctorate[13003] teacher[12624] fellowship[12577] senior[12391] assistant[12347] scientist[12345] associate[12340] programs[12158] harvard[12135] girls[12115] earned[12036] undergraduate[12035] children[12034] library[11783] appointed[11751] elected[11744] economics[11678] committee[11564] higher[11466] lecturer[11451] california[11352] universities[11350] centre[11176] taught[11065] dean[11048] technical[11032] social[11004] medal[10968] doctoral[10926] 
topic39=hai[8470] gaon[7563] asha[7267] kapoor[6431] khan[6388] bollywood[6282] ki[6257] hindi[6214] hum[6039] dil[5762] ek[5625] rani[5603] zee[5353] kristiansand[5229] ke[4966] kumar[4819] prem[4813] kamal[4777] patel[4442] mehta[4285] pandit[4277] vest[4251] bengali[4066] kriegsmarine[4011] dutta[3883] meena[3864] kishore[3779] wunderlich[3687] ravindra[3601] shree[3445] bhatt[3410] mein[3293] bir[3278] lata[3276] begum[3272] rafi[3221] pandey[3204] yeh[3201] uttarakhand[3195] lund[3181] ashok[3159] na[3124] malmö[3097] priyanka[3068] vidya[3003] se[2930] oberleutnant[2901] sinha[2890] varun[2885] gaurav[2823] dey[2784] chandran[2784] supercharged[2743] tum[2742] eifel[2739] aur[2734] freiburg[2688] mangeshkar[2621] pyaar[2590] dinesh[2553] khanna[2549] tiwari[2535] satya[2526] aman[2473] bhai[2448] gaya[2436] malhotra[2386] mera[2366] deepa[2295] mohammed[2267] arora[2262] hatun[2260] castleford[2259] lyricist[2259] cine[2230] kiel[2225] ashish[2220] astana[2203] onna[2198] chopra[2179] sameer[2170] schweiz[2159] voss[2134] narendra[2134] teri[2086] beşiktaş[2086] hoon[2071] govind[2047] mukesh[2047] fri[2047] sultana[2045] saratov[2006] nicosia[1955] pyar[1946] kanta[1933] bhi[1931] sivan[1920] rennes[1915] hain[1897] bursa[1875] 
topic40=business[62311] bank[45296] management[42236] founded[39588] development[39578] million[38854] services[38106] companies[37701] nigeria[36136] ceo[32953] investment[31178] financial[30730] industry[29560] global[28739] products[27817] market[27164] economic[26174] organization[25446] nigerian[25403] executive[24896] firm[24463] board[23795] countries[23356] technology[22978] marketing[22843] social[22456] billion[22408] capital[22361] project[22334] fund[21965] president[21728] trade[21066] co[21000] africa[20931] chairman[20698] foundation[20602] food[20587] health[20255] finance[19930] media[19487] energy[19432] founder[19318] community[19003] director[18935] uk[18182] entrepreneur[18169] projects[18150] policy[17883] ltd[17875] education[17753] brand[17737] private[17729] funding[17581] association[17550] agency[17412] exchange[17407] sector[17367] corporation[17295] established[17162] india[16963] largest[16918] partners[16750] wheatbelt[16684] corporate[16636] insurance[16540] information[16275] program[16190] inc[16165] ministry[16131] online[16063] launched[16011] research[15944] european[15927] department[15868] office[15320] employees[15231] tax[15116] support[15069] limited[14933] banking[14904] owned[14863] stock[14791] organizations[14747] network[14505] security[14413] profit[14303] chief[14171] activities[13805] relations[13628] us[13620] businesses[13484] awards[13418] acquired[13364] resources[13349] non[13274] investors[13187] venture[13118] headquartered[13105] provides[13096] environmental[13068] 
topic41=church[184207] bishop[142661] catholic[79829] roman[57586] cathedral[49352] diocese[48368] pope[45495] parish[40096] priest[36907] archbishop[34780] ordained[29370] saint[28537] chapel[27905] titular[27380] consecrated[26688] papacy[25253] prelate[24380] appointed[22328] holy[20630] monastery[19511] churches[19174] cardinal[19067] religious[17061] apostolic[17038] episcopal[16749] italy[16431] abbey[16390] giovanni[16301] bishopric[15664] mary[14738] christian[14601] christ[14522] biography[14335] catholicism[14280] seminary[14230] san[14120] congregation[13975] convent[13940] bishops[13781] saints[13735] vicar[13669] di[13663] theological[13334] jesus[12991] rome[12699] virgin[12675] paul[12564] orthodox[12463] anglican[12396] francesco[12108] latin[12077] altar[12045] missionary[12002] baptist[11974] theology[11936] basilica[11743] lady[11291] rector[11271] organ[11034] province[11005] maria[10881] our[10626] dedicated[10464] italian[10344] pastor[10288] madonna[9993] rev[9860] nave[9757] town[9320] santa[9233] ezekiel[9035] baroque[8993] antonio[8823] pietro[8685] palazzo[8597] blessed[8580] mission[8565] temple[8542] rite[8488] della[8426] chaplain[8137] painted[8127] ecclesiastical[8107] francis[8035] co[8026] joseph[8007] depicting[8000] pius[7995] patriarch[7985] carthage[7761] frescoes[7731] altarpiece[7703] fr[7603] jesuit[7503] lutheran[7486] building[7470] founded[7466] battista[7298] ancient[7241] abbot[7204] 
topic42=journal[58343] professor[44799] research[42675] book[35563] studies[31926] editor[27187] science[25575] social[25242] philosophy[23378] press[23086] books[22043] society[21390] publications[20222] theory[19708] academic[18954] psychology[18327] isbn[17815] author[17415] articles[16041] political[15678] institute[15406] study[15332] sciences[15327] economics[15036] scientific[14726] pp[14136] reviewed[13987] review[13396] phd[13359] peer[13080] oxford[13006] language[12736] literature[12097] edited[11417] culture[11187] works[11033] cambridge[10854] sociology[10731] law[10729] historian[10691] education[10472] politics[10296] harvard[10228] citation[10126] journals[10101] vol[10000] ph[9973] analysis[9948] fellow[9891] human[9769] scholar[9752] faculty[9704] how[9689] association[9562] ed[9484] cultural[9309] co[9149] policy[9043] anthropology[8934] impact[8916] studied[8829] media[8742] thesis[8721] chief[8467] gender[8427] historical[8409] publication[8384] associate[8147] volume[8032] quarterly[7987] ethics[7948] papers[7935] taught[7931] linguistics[7922] publishing[7867] authored[7826] teaching[7793] feminist[7726] knowledge[7716] issues[7674] selected[7664] dissertation[7630] modern[7486] topics[7470] economic[7424] edition[7363] reports[7255] eds[7229] religion[7225] editorial[7140] lecturer[7126] department[7103] yanow[7080] bibliography[7017] doctorate[6960] humanities[6943] women[6786] london[6779] prize[6754] covering[6705] 
topic43=doubles[62010] singles[38664] tennis[30510] tournament[27274] semifinals[24132] atp[23542] qualifier[23072] runner[20096] clay[18443] quarterfinals[18148] entrants[16119] nr[15924] itf[15746] tournaments[15450] heats[15356] winner[13941] hard[12340] challenger[11742] open[11226] ranking[11085] partner[11048] semifinal[10665] seed[10456] prix[10207] qf[9065] seeded[8771] courts[8385] lil[8120] tour[8040] rankings[7760] contestant[7207] finals[6946] bye[6738] qualifying[6712] slam[6267] surface[6133] grand[6068] airdate[5986] eremophila[5964] title[5914] sf[5834] loser[5731] imdb[5660] contestants[5552] seeding[5168] carpet[5015] fastest[4736] wimbledon[4631] elena[4626] partnering[4617] teaser[4483] quarterfinal[4461] partnered[4219] women[4198] runners[4031] forster[3975] grass[3675] defeated[3626] titles[3608] wildcard[3537] northridge[3525] mixed[3469] davis[3443] michelle[3383] kendrick[3293] danielle[3148] yana[3119] gaga[3067] anna[3053] nicole[3053] laura[2949] petra[2900] maria[2894] sets[2893] finalists[2776] edm[2762] kristina[2688] andrea[2670] julia[2651] paula[2648] olga[2638] femina[2627] anastasiya[2622] snoop[2588] simona[2588] lukáš[2570] jessica[2550] soler[2535] stefani[2514] cr[2511] mullins[2471] jennifer[2467] sandra[2451] janeiro[2451] outcome[2445] billie[2436] samantha[2414] masters[2352] raven[2340] kwok[2322] 
topic44=roman[10874] rome[9933] portico[9040] lucius[8415] gaius[8213] consul[7477] goalscorers[6900] bc[6832] marcus[6377] gens[5447] quintus[5062] ovate[4888] civitas[4847] walsingham[4697] publius[4451] villanova[4309] ad[4131] stuccoed[3961] titus[3890] doric[3776] cicero[3468] premio[2967] aquila[2792] racine[2785] balustrade[2755] caesar[2632] fireboat[2608] virtus[2540] canvases[2500] naft[2421] galleria[2402] freedman[2291] lnb[2280] gladiators[2254] julius[2245] minerva[2169] omnium[2165] aulus[2139] cassius[2030] église[2022] severus[1985] gnaeus[1976] romaine[1976] palladian[1964] larissa[1933] tribune[1923] dio[1921] suffect[1899] poésie[1876] prefect[1868] daphne[1859] âge[1830] claudius[1822] ettore[1813] cornelius[1791] palmyra[1789] teniers[1787] ancient[1775] pavić[1773] barberini[1755] captaining[1742] poli[1722] alii[1704] urbino[1696] renzo[1676] bene[1675] blaenau[1672] adonis[1663] mazandaran[1661] consulship[1650] komnenos[1639] lodz[1635] morelli[1624] nazionale[1622] castellan[1613] consular[1606] pliny[1593] francesca[1576] doria[1572] tiberius[1560] monti[1548] legate[1546] leaden[1543] maximus[1522] antonius[1520] nero[1495] nuovo[1492] secundus[1482] editore[1481] lamia[1474] altieri[1470] tacitus[1469] manrique[1461] italia[1460] tentatively[1457] moselle[1452] paget[1451] conti[1441] scipione[1441] proconsul[1427] 
topic45=cast[106746] episode[104407] television[101916] tv[97481] films[97428] directed[95271] actor[86329] award[84352] awards[83283] role[80312] festival[77785] actress[77215] show[77209] drama[76848] director[72861] episodes[71451] filmography[63669] comedy[63262] production[61761] story[61191] plot[60452] theatre[60104] movie[59504] title[55736] love[55489] documentary[51693] producer[50330] starring[45898] short[45616] man[44930] produced[44609] miss[43617] roles[42418] written[41939] novel[41055] girl[40519] stars[39199] character[38285] young[37743] cinema[37252] play[37030] mother[35824] star[34945] premiered[34559] feature[34062] you[33957] reception[32984] appeared[32162] aired[32037] my[31905] writer[31751] father[30700] woman[30434] lead[30232] nominated[29931] characters[29326] release[28684] night[28631] stage[28334] acting[28280] supporting[27766] productions[27653] book[27639] entertainment[26206] co[26185] reviews[25469] theater[24723] channel[24384] broadcast[24226] selected[23074] video[22948] horror[22891] black[22884] voice[22818] animated[22709] wrote[22704] guest[22695] debut[22683] special[22518] boy[22165] live[22153] get[21959] network[21814] shows[21463] go[21445] filming[21406] friend[21402] pictures[21348] critics[21279] category[21205] starred[21168] children[21122] thriller[21068] wife[20865] nominations[20865] worked[20852] mr[20666] screenplay[20657] actors[20642] premiere[20495] 
topic46=party[204527] election[190588] elected[103424] assembly[97060] minister[91615] politician[82753] democratic[82621] elections[80227] council[73314] president[72764] votes[72189] republican[70852] legislative[68967] parliament[60172] candidate[57620] district[54525] political[53260] secretary[50612] constituency[48971] senate[48504] electoral[46748] committee[46693] mayor[46247] vote[44440] deputy[43741] representatives[43491] seat[43159] governor[42276] law[41649] liberal[41624] seats[40976] presidential[39158] candidates[38596] incumbent[38495] appointed[35910] union[35228] term[34940] results[34404] labour[33338] court[32926] representative[32692] chairman[32324] parliamentary[31656] labor[30604] affairs[30478] congress[30264] office[30070] vice[28940] cabinet[28177] justice[27945] sarpanch[27449] ambassador[27056] trump[25821] attorney[25269] leader[24881] democrat[23762] ministry[23742] conservative[23072] representing[22864] primary[22864] voters[22802] independent[22416] prime[22211] senator[21688] legislature[21617] politics[21590] worked[21353] socialist[21297] supreme[21184] degree[21151] graduated[20869] board[20853] chief[20760] judge[20434] education[19355] federal[19307] communist[19162] campaign[19050] executive[18950] lawyer[18823] social[18245] polling[17995] coalition[17986] re[17921] commission[17838] alliance[17605] represented[17408] defeated[17367] ran[17299] wisconsin[17248] commissioner[17087] serving[16986] joined[16954] position[16900] foreign[16606] voting[16471] parties[16068] municipal[15382] director[15360] mp[15269] 
topic47=japan[41130] japanese[37224] tokyo[21680] myanmar[16827] fuji[14322] prefecture[13669] mbc[11758] subdistrict[10368] shenzhen[10012] osaka[9678] peng[8471] nagano[7475] kyoto[7440] universiade[7133] burmese[7043] xiang[6925] burma[6891] lim[6834] feng[6677] tianjin[6492] yangon[6285] dalian[6170] jiangsu[6106] aung[5991] hokkaido[5983] niigata[5809] pilbara[5808] seok[5607] wa[5516] sbs[5503] nippon[5423] xie[5283] nagoya[5189] prefectural[5145] fukuoka[5139] asahi[4967] kalgoorlie[4941] hiroshima[4732] sumo[4543] edo[4385] ono[4227] anhui[4204] yao[4096] sapporo[4051] nhk[4048] buri[4023] shan[3936] nsw[3925] lei[3876] okinawa[3854] nakamura[3853] guizhou[3843] maung[3638] prema[3638] kyaw[3614] fukushima[3565] harbin[3551] chiba[3528] myung[3455] hainan[3433] meiji[3408] lateritic[3254] nara[3173] yamaguchi[3160] mandalay[3092] ganesan[3077] yunnan[3061] tokugawa[3059] zhi[3036] ldp[2998] qingdao[2997] zhuang[2914] china[2867] nagasaki[2860] emperor[2849] saitama[2821] sho[2801] dai[2782] sakai[2717] sendai[2697] aditi[2673] ji[2647] tun[2635] ito[2624] hsiao[2620] okayama[2617] haruka[2601] nan[2584] kagoshima[2575] maeda[2564] myint[2561] ningbo[2559] nihon[2555] xun[2510] sakhalin[2454] dong[2451] kanagawa[2430] keqiang[2417] kumamoto[2411] ku[2396] 
topic48=ochreous[46314] blackish[29099] yellowish[11788] vevo[6894] sheathed[5193] paler[4989] elongate[4490] buff[4326] everest[4287] leumit[4130] parramatta[3939] faintly[3568] sssi[3301] shining[3262] bluish[3190] mountaineers[3075] beitar[3057] blotches[2975] specks[2516] ariana[2466] sikkim[2432] collie[2078] crosse[2068] washburn[1992] pubmed[1909] sriram[1893] britney[1877] lilac[1801] osu[1757] mcnulty[1717] lindley[1707] hut[1700] streaks[1683] stripe[1682] waterfalls[1664] stillwater[1660] eardley[1644] jamieson[1641] azalea[1641] flume[1637] neogene[1611] pizza[1582] burger[1570] sportswear[1563] cassie[1547] subpopulations[1509] cofounder[1456] cyrille[1438] loir[1436] effie[1430] csiro[1421] gonville[1395] tinashe[1389] ashanti[1380] sturt[1368] emery[1366] hadassah[1350] behar[1348] minden[1347] tepals[1347] trekking[1324] azar[1306] murrumbidgee[1305] brickell[1302] bondi[1288] nir[1250] naas[1243] ashdod[1242] faint[1230] duffield[1227] whorls[1220] electropop[1215] werft[1204] guttenberg[1203] earlham[1202] hornbeam[1197] fries[1192] fasano[1192] badger[1183] sunfish[1180] riverton[1176] backstreet[1176] cramer[1173] dinoflagellates[1165] charly[1163] alana[1158] thallus[1149] sandwich[1147] tunbridge[1146] mcdougall[1138] hanni[1134] swimwear[1118] sherpa[1115] flaky[1113] cobbles[1109] heide[1103] prodromus[1089] rihanna[1055] caldera[1045] prathap[1045] 
topic49=army[98174] regiment[81470] military[62236] division[55636] infantry[55247] brigade[46857] corps[45109] commander[44190] battalion[38875] battle[38055] forces[36243] officer[35815] lieutenant[33570] artillery[33461] colonel[32259] command[32030] force[29849] rifle[26435] scout[24754] medal[23391] nd[22853] staff[21355] air[20957] unit[19476] troops[18918] chief[18888] promoted[18710] fought[18563] rd[18153] royal[17821] soldiers[17722] defence[17525] cavalry[16788] awarded[16734] civil[16662] commanded[16140] rank[15876] armed[15173] killed[14694] fort[14384] officers[14371] wounded[14349] units[13902] training[13870] operation[13623] camp[13384] guards[13136] operations[12895] german[12845] captain[12806] attack[12736] guard[12658] defense[12600] commanding[12460] drilliidae[11719] naval[11677] offensive[11584] combat[11520] brigadier[11499] front[11355] battery[11339] enlisted[11331] appointed[11282] tank[10776] navy[10681] campaign[10637] reserve[10443] intelligence[10280] fire[10083] scouting[10016] headquarters[9898] soviet[9865] deputy[9795] formed[9726] men[9679] regiments[9604] transferred[9584] cross[9561] commanders[9381] duty[9354] captured[9280] marine[9258] siege[9226] police[9112] security[9049] red[8994] joined[8910] volunteer[8893] squadron[8820] fighting[8796] assigned[8741] base[8726] soldier[8539] scouts[8409] px[8358] action[8314] field[8305] commando[8210] garrison[8149] legion[7905] 
topic50=kg[79138] wrestling[31050] win[21009] boxing[19512] heavyweight[16104] tko[14780] fight[13618] freestyle[13026] championship[12766] nov[11047] tbs[10817] title[9870] event[9570] mar[9536] sep[9387] judo[9336] champion[9277] middleweight[8683] match[8662] curling[8652] wrestler[8641] decision[8602] professional[8457] oct[8255] boxer[8105] ko[8039] dec[7854] feb[7775] unanimous[7722] tcu[7684] weight[7644] martial[7516] lightweight[7301] aug[7284] vs[7171] bout[7149] ud[6876] hawaii[6798] jun[6761] pts[6706] wrestlers[6318] hawaiian[6205] welterweight[5947] hispanicized[5860] submission[5816] loss[5806] ring[5553] ref[5506] round[5432] nbl[5431] taekwondo[5288] defeated[5191] quarterfinals[4985] div[4667] pro[4634] wbc[4613] iwrg[4488] fighting[4460] punches[4397] mma[4286] karate[4277] mixed[4154] kickboxing[4131] tournament[3985] light[3914] fights[3845] championships[3693] super[3671] sounders[3592] gnis[3588] record[3566] fencing[3376] fighter[3294] jan[3281] donbass[3236] doncaster[3188] night[3125] rua[3017] trapani[2936] opponent[2933] quota[2918] honolulu[2880] arts[2828] ultimate[2820] promotion[2791] cage[2696] avenida[2695] arena[2616] boxers[2600] fought[2575] events[2568] fighters[2567] matches[2565] kumite[2556] knockout[2546] date[2522] olivera[2499] lost[2492] amateur[2476] results[2467] 
topic51=gameplay[12152] pune[10372] lviv[10075] playstation[9759] panchayati[8950] bydgoszcz[5898] cuttack[5221] rk[5034] nintendo[5004] singha[4280] metacritic[3723] multiplayer[3593] srinagar[3546] ivano[3434] ds[3412] balaji[3353] mohun[3196] sega[3195] udaipur[3162] bhat[3143] ps[3143] oblast[3130] pokémon[3019] mladost[2932] frankivsk[2892] vimeo[2848] tucumán[2826] subotica[2796] ternopil[2610] sloboda[2568] sachin[2469] wii[2468] banja[2384] sxsw[2308] vadodara[2254] naves[2227] veronika[2216] petar[2209] katarzyna[2188] namco[2134] gabi[2133] cider[2106] hsinchu[2105] pölten[2083] chernivtsi[2040] zala[1937] sudheer[1936] wiz[1929] dk[1900] maratha[1878] marjan[1862] ulica[1856] adria[1796] bandai[1780] uzhhorod[1772] rathore[1742] gdansk[1707] épinal[1704] luv[1692] ua[1681] deconsecrated[1679] satara[1650] gornji[1646] hazrat[1625] jat[1600] gophers[1589] gdynia[1573] ato[1566] gazeta[1511] sanda[1486] möller[1478] yuko[1477] malwa[1461] nizamuddin[1457] mesto[1446] quilmes[1443] scooby[1434] xtreme[1421] chanda[1421] misiones[1410] fk[1404] saša[1404] vita[1394] hoshi[1392] jagir[1387] sopot[1372] choo[1367] pkp[1365] nitro[1356] moti[1340] sonic[1334] chhatrapati[1325] zielona[1320] jp[1312] mahal[1294] kole[1291] arakawa[1287] valjevo[1279] ahmednagar[1266] rajaram[1238] 
topic52=ngc[15404] star[13616] galaxy[13414] observatory[12048] solar[11438] planets[11073] planet[9815] eclipse[9471] constellation[9445] telescope[9154] earth[9134] astronomy[8733] sun[8447] asteroid[7412] magnitude[7253] stars[7208] minor[6669] astronomical[6647] astronomer[6222] discovered[5866] kitt[5595] xo[5532] orbit[5424] cluster[5329] galaxies[5087] mass[4957] stellar[4842] jupiter[4807] dwarf[4684] wsl[4507] cowdenbeath[4457] eclipses[4239] astrophysics[4224] tarun[4077] comet[4030] light[3995] kepler[3970] binary[3950] copulatory[3612] comets[3588] type[3526] pocock[3487] radius[3404] milky[3394] planetary[3384] cet[3338] discovery[3327] exoplanet[3325] observations[3295] galileo[3205] spiral[3187] survey[3166] system[3138] orbital[3114] orbits[3082] discoverers[2982] supernova[2970] prahran[2949] galactic[2930] distance[2878] spectral[2876] universe[2853] apparent[2803] au[2802] nebula[2782] herschel[2758] hubble[2748] hd[2747] visible[2738] temperature[2726] diameter[2684] hr[2651] infrared[2647] orbiting[2630] object[2623] variable[2608] faint[2565] yuka[2555] asteroids[2554] docent[2554] sigma[2524] moon[2464] globular[2456] shrubland[2429] neptune[2415] space[2385] brightness[2381] sn[2377] nasa[2357] gamma[2337] lambda[2332] roofline[2310] objects[2308] galapagos[2280] alpha[2277] anoop[2226] telescopes[2215] sky[2167] eso[2140] catalogue[2139] 
topic53=french[25749] jean[23543] france[14458] marie[14327] pierre[13758] philippe[12212] louis[12060] iaaf[11850] births[11435] françois[10502] count[10425] paris[10372] nationality[10049] events[9469] antoine[9347] jacques[9332] charles[9175] nicolas[8976] la[8541] deaths[8489] marcel[8426] rank[8158] le[8019] alexandre[7824] stade[7766] michel[7442] andré[7205] anna[7069] gallimard[7054] henri[6917] uci[6654] van[6433] gérard[6196] irina[6184] maria[6181] maurice[6139] bnf[6114] denis[6063] belgian[6013] olivier[5988] directed[5935] jeanne[5932] laurent[5929] jules[5823] madame[5770] claude[5752] saint[5614] ekaterina[5505] guillaume[5467] yves[5466] cast[5439] baptiste[5422] albert[5400] éd[5394] luxembourg[5383] furlongs[5322] paul[5310] robert[5306] universiade[5232] du[5177] sophie[5165] anne[4992] bests[4933] haiti[4844] bibliography[4814] rené[4814] married[4795] sainte[4785] submissions[4740] léon[4727] françoise[4713] armand[4698] foreign[4629] louise[4619] opéra[4583] von[4520] joseph[4500] brussels[4461] christophe[4460] georges[4434] germain[4409] frédéric[4401] andrey[4398] haitian[4382] prince[4284] elisabeth[4272] weeknd[4195] madeleine[4063] actes[4062] provence[4056] victor[3986] christian[3910] martin[3879] simon[3878] belgium[3858] sud[3849] andreas[3827] catherine[3800] comique[3792] marguerite[3754] 
topic54=consecrators[14133] consecrator[13244] auxiliary[11339] vietnam[10669] beatification[8909] vietnamese[8829] soundcloud[7535] archdiocese[6823] cambodia[6792] lough[5978] lega[5459] archery[4938] minh[4898] bari[4523] cambodian[4395] suffragan[3764] novara[3731] nguyen[3635] raghu[3566] palermo[3518] phnom[3453] serie[3325] beatified[3325] vicariate[3303] penh[3274] presbytery[3250] bodo[3155] khmer[3138] priory[3094] naver[3086] coolgardie[3085] recurve[3075] dewi[3068] soissons[2924] aspx[2855] hanoi[2852] kampong[2780] santi[2708] thanh[2685] calcio[2644] livorno[2638] catania[2632] venerable[2611] spezia[2609] buttresses[2589] negros[2523] belfry[2520] viet[2391] mac[2296] bareilly[2252] rectory[2227] vlad[2157] avellino[2126] varese[2123] ascoli[2090] rimini[2073] apr[2038] nam[2035] saigon[2019] ancona[1996] hagiography[1991] cagliari[1946] lecce[1943] dijk[1937] chi[1915] hoa[1865] exarchate[1853] ho[1847] ros[1834] mag[1831] mauretania[1820] carlist[1791] sasi[1781] udine[1758] bunga[1748] salerno[1719] uí[1692] laos[1689] malo[1687] huan[1668] severino[1664] harwich[1650] vercelli[1648] cheong[1648] indochina[1640] abbess[1611] kotte[1603] isola[1600] lop[1572] taranto[1570] myra[1560] livio[1543] kartli[1538] messina[1480] ní[1474] quang[1473] albans[1438] alessandria[1436] imperii[1402] progresso[1398] 
topic55=px[71769] rebounds[18292] freshman[14594] discogs[11613] hornets[10175] redshirt[9128] assists[7193] streptomyces[6854] bobcats[6790] steals[4649] aba[3939] gators[3740] ucf[3550] cavaliers[3096] georgetown[2769] conway[2601] lakers[2535] shl[2379] radford[2285] gator[2277] dynamos[2237] pnp[2141] zaria[2130] vmi[2085] hodge[2010] olsson[2001] nfb[1994] pacers[1990] ewing[1930] correia[1928] barros[1920] tor[1789] girton[1763] blazers[1730] donny[1645] ojo[1615] ginebra[1600] gatorade[1600] spg[1567] ima[1498] ridgeway[1452] semifinals[1354] padmanabhan[1326] hakeem[1319] yosuke[1309] polyhedron[1302] mcguire[1300] adidas[1291] efes[1275] scoring[1255] knicks[1236] ade[1205] layla[1173] penicillium[1169] buzzer[1168] phelan[1166] majored[1162] antigen[1135] hv[1120] feni[1111] finke[1100] honeycomb[1085] arifin[1084] antibiotic[1072] nahuel[1059] cohomology[1054] nordre[1028] mussoorie[1028] lejeune[1021] bowers[1018] pero[1016] mage[1005] homotopy[1002] polyhedra[981] hanka[976] nagel[972] lebron[968] guedes[951] vanya[936] cowritten[926] easement[923] ppg[909] glaucoma[875] sabo[875] gainesville[870] mobygames[858] antigens[856] bower[850] ppv[850] bharatpur[849] verdier[842] fgm[839] tallied[836] stainton[828] iverson[826] movimento[813] augsburger[793] schooler[790] ketone[790] romberg[779] 
topic56=king[43978] castle[30534] son[29120] prince[21403] ottoman[21204] empire[20991] emperor[19626] dynasty[19399] reign[18713] kingdom[17673] battle[15445] princely[14860] sultan[14258] ruler[13920] clan[13605] brother[13317] throne[12781] ruled[12726] daughter[12670] sources[12588] bc[12225] governor[12115] succeeded[11997] iii[11803] count[11731] father[11706] byzantine[11670] ce[11662] queen[11425] imperial[11397] royal[10656] kirkus[10163] princess[10148] army[10109] title[8827] treaty[8485] rajput[8472] fortress[8421] siege[8378] inscription[8357] married[8309] noble[8255] mentioned[8167] wife[8096] according[8055] khan[7994] palace[7994] sons[7884] tribe[7788] pasha[7757] rebellion[7724] rule[7605] military[7603] killed[7522] shah[7500] defeated[7309] duke[7274] safavid[7235] mother[7206] gujarat[7081] iv[7067] lands[6761] sent[6752] period[6695] chief[6638] ancient[6572] forces[6537] rulers[6399] crown[6192] kings[6189] captured[6163] court[6111] land[6100] consort[5992] medieval[5964] region[5919] capital[5906] bce[5899] probably[5879] troops[5831] sultanate[5684] conquest[5662] constantinople[5622] armenian[5595] georgian[5503] territory[5494] descendants[5464] roman[5439] bibliography[5433] revolt[5431] tbilisi[5377] successor[5363] led[5310] heir[5294] raja[5292] tribute[5289] ottomans[5242] town[5236] fort[5210] lord[5206] 
topic57=police[53815] women[53746] court[50040] law[44328] rights[41621] act[31618] prison[30057] case[29132] said[27749] political[27647] arrested[26013] party[23626] cannabis[23173] movement[23086] justice[21937] legal[21594] criminal[20966] president[20236] security[19543] union[19465] anti[19453] killed[19263] civil[18308] social[18277] violence[18180] murder[17823] attack[17412] children[17097] activist[16755] human[16734] investigation[16437] supreme[16358] organization[16272] workers[16214] men[16101] support[15808] crime[15741] trial[15604] stated[15598] we[15350] victims[15134] should[14555] authorities[14407] laws[14316] sentenced[14316] minister[14231] sexual[14217] freedom[14194] according[14142] committee[14129] accused[14099] cases[14089] right[14060] reported[14054] communist[13925] report[13846] officers[13597] claimed[13544] sex[13542] woman[13525] black[13475] protest[13417] military[13294] led[13251] media[13042] federal[12957] protests[12784] involved[12734] campaign[12555] decision[12538] lgbt[12482] arrest[12406] news[12193] incident[12060] strike[11974] community[11953] constitution[11912] officials[11865] illegal[11858] gay[11673] without[11601] council[11333] african[11292] prisoners[11200] sent[11138] commission[11094] charges[11059] newspaper[11055] camp[11029] leader[10947] person[10909] working[10897] department[10781] months[10748] issues[10725] article[10715] gender[10605] even[10581] peace[10566] bill[10543] 
topic58=scottish[61501] scotland[38163] edinburgh[33734] glasgow[31158] dundee[14221] aberdeen[13611] stirling[8945] thistle[8823] celtic[8718] dumbarton[8159] falkirk[8037] rangers[7864] dunfermline[7271] morton[6729] clyde[6309] loch[6286] hamilton[5689] queen[5447] clydebank[5193] albion[5153] inverness[4936] scots[4636] leith[4547] cairn[4362] lanark[4193] snp[4153] sutherland[4108] irvine[4049] berwick[3971] ayrshire[3930] highland[3910] argyll[3858] greenock[3855] alef[3854] sidings[3849] gaelic[3802] livingston[3684] galloway[3629] woolwich[3563] thomson[3533] blyth[3379] macdonald[3337] heart[3282] lothian[3152] arbroath[3139] gaels[2984] mackay[2963] fusiliers[2942] highlanders[2916] shetland[2876] flinders[2839] montrose[2821] hay[2792] ross[2786] hussars[2630] gazetteer[2599] faisalabad[2585] buchan[2573] orkney[2570] dunbar[2520] forsyth[2502] caledonian[2496] lanarkshire[2491] watt[2465] wizkid[2464] millar[2424] moray[2413] dumfries[2399] stela[2386] swinton[2360] bertie[2323] giza[2309] ratan[2285] dundas[2275] burgh[2264] fixtures[2263] andrews[2239] maitland[2236] aldershot[2213] infirmary[2191] balfour[2170] mckenzie[2163] blackwall[2129] régiment[2097] paisley[2093] melville[2069] barr[2043] emus[2037] guthrie[2031] macleod[2022] dela[2022] abercromby[2007] grafton[1992] blackwater[1988] caithness[1948] skye[1940] wingate[1939] strathclyde[1898] kerr[1878] baird[1830] 
topic59=gastropod[19895] snail[18727] brownish[17204] mandal[14931] fk[14810] dass[11818] spartak[8635] attains[8418] kathiawar[6662] ferruginous[5445] junagadh[4835] spacewatch[4228] trampoline[3803] dir[3669] saha[3441] submarginal[3178] manju[2974] bhavnagar[2837] bruckner[2815] baroda[2757] kantor[2573] hordaland[2556] iow[2548] nagendra[2488] ist[2481] yellowish[2432] birla[2380] lokesh[2370] preto[2344] poulenc[2312] kavi[2195] ndr[2180] sandown[2167] northam[2147] tarento[2109] mys[1991] eel[1924] catalina[1904] mau[1894] unplaced[1894] sohn[1892] longitudinally[1846] pavan[1830] të[1823] sudhakar[1807] wroclaw[1800] figs[1754] aerobatic[1746] gott[1738] uns[1730] partizani[1723] molluscs[1698] iridescent[1686] unser[1672] darter[1667] baumann[1659] freund[1638] alles[1637] mathur[1624] dürr[1617] sl[1603] centimeters[1592] graphene[1564] ryde[1561] schröder[1553] magazin[1542] fröhlich[1530] schauer[1528] bicolor[1526] sucker[1516] dich[1498] trond[1465] kanwar[1457] trøndelag[1455] wight[1430] ikaw[1410] förster[1404] gedichte[1390] ribbed[1385] geckos[1381] goby[1370] guppy[1368] hoff[1329] tereza[1327] riffles[1320] vidarbha[1305] laramie[1297] hau[1290] wingfield[1284] spots[1272] similis[1261] aerobatics[1260] hurwitz[1245] pumila[1240] rad[1225] ornata[1212] translucent[1211] kleiner[1206] stingray[1200] spirally[1197] 
topic60=chemical[16136] acid[13703] chemistry[13003] organic[10597] compound[10029] reaction[9887] synthesis[9552] carbon[7907] mineral[7890] compounds[7635] mnet[7138] acetate[6764] minerals[6709] ohl[6649] formula[6460] ch[6084] reactions[5807] hydrogen[5689] sodium[5420] structure[5331] newnham[5046] atoms[4973] methyl[4811] molecules[4732] ester[4562] gravelly[4526] oxide[4206] sulfate[4086] chloride[4062] nitrogen[4033] și[3964] molecular[3930] properties[3929] acids[3882] oxygen[3875] metal[3821] ion[3791] ovoid[3760] salt[3716] sulfur[3693] chemist[3662] pv[3611] atom[3597] oxidation[3544] ions[3540] liquid[3537] cl[3519] fluoride[3499] potassium[3393] enzyme[3387] solid[3342] crystal[3335] synthetic[3249] bond[3238] în[3182] electron[3163] phosphate[3139] polymers[3068] ligand[2983] molecule[2975] premios[2973] crystals[2925] solution[2922] lithium[2912] lng[2896] dota[2849] cobalt[2832] water[2782] coli[2768] complexes[2758] mg[2744] temperature[2725] iron[2719] na[2699] gsk[2672] complex[2638] abl[2631] nickel[2615] pharma[2553] ether[2549] dioxide[2540] synthesized[2535] gheorghe[2503] chromosome[2487] uranium[2474] catalyst[2460] salts[2458] metals[2451] soluble[2391] ring[2388] derivatives[2387] cu[2378] bromide[2358] copper[2355] zinc[2344] gas[2343] grigore[2295] nitrate[2290] catalytic[2284] icm[2280] 
topic61=cdp[12920] wine[10532] dinamo[9196] inseason[8298] inseries[7959] friendlies[7098] cska[6664] libero[5700] albion[5419] obispo[5218] winery[4988] hove[4730] fernandes[4675] northampton[4599] fulham[4481] notts[4238] chul[3927] sathya[3631] loughborough[3550] wines[3539] bournemouth[3448] gillingham[3298] muñoz[3252] luton[3245] swi[3228] everton[3221] marítimo[3113] hyaline[3050] grape[2930] bromwich[2898] kaiserslautern[2759] solberg[2643] vineyard[2465] cletus[2389] pref[2328] ava[2293] éditeur[2272] foursquare[2182] coursed[2180] shipley[2179] westlake[2164] tulloch[2138] hotspur[2127] golson[2116] soria[2088] badajoz[2079] vineyards[2068] maur[2036] hallam[2016] morecambe[1993] tóth[1957] metzger[1946] quilts[1926] ondrej[1919] glossop[1885] sportif[1883] ángela[1878] kimball[1828] rfa[1827] marriott[1817] schalke[1816] grapes[1815] cáceres[1804] boldklub[1790] grays[1783] bainbridge[1738] ashfield[1714] leiria[1700] roldán[1679] heck[1676] domaine[1666] merritt[1644] emerita[1629] napa[1541] barış[1539] howland[1533] tain[1531] apuestas[1530] hooch[1506] strýcová[1485] marwar[1475] blagoevgrad[1473] robles[1472] moline[1456] agostini[1455] rolland[1455] uesugi[1450] cellar[1439] burbank[1438] germantown[1429] pavol[1417] simi[1390] crum[1388] bsk[1383] linnea[1367] talavera[1366] chitti[1359] virgilio[1355] hitchin[1355] navarra[1339] 
topic62=nepal[23637] nepali[10959] grevillea[9052] swiss[8915] kathmandu[7680] basel[7447] canton[7003] nepalese[6466] bogotá[5626] rana[5335] mendis[4567] bahadur[4414] grimsby[4340] coins[4262] switzerland[4250] gopi[4236] coin[3906] bern[3872] thapa[3726] ukip[3501] roshan[3406] malla[3181] zürich[3148] duleep[3104] terriers[3017] medellín[2860] akash[2662] nasl[2605] hazare[2519] bochum[2448] argyle[2440] bundestag[2324] aomori[2312] annapurna[2308] volley[2282] nrw[2204] jäger[2134] kiran[2113] leb[2113] mint[2078] boyacá[2069] tiempo[2056] lucerne[2025] germaniawerft[1963] pratap[1946] gorkha[1915] oeste[1903] colombian[1899] bretagne[1894] kunwar[1892] tranmere[1888] laxmi[1850] banknotes[1812] farooq[1809] plon[1772] joakim[1737] domínguez[1698] bhanu[1676] gurung[1670] maoist[1641] minted[1640] pml[1626] redhawks[1622] zug[1600] wycombe[1596] cantons[1564] jons[1551] sita[1546] tapia[1546] durbar[1528] cundinamarca[1527] venegas[1525] kanazawa[1514] cantonal[1511] banknote[1508] leduc[1499] aeg[1476] socorro[1453] siècles[1436] caldas[1436] bikram[1412] sme[1394] antioquia[1380] yala[1364] baig[1364] carril[1355] cali[1353] worthing[1341] schmid[1330] rampur[1322] paisa[1315] hampden[1284] lalitpur[1282] restrepo[1281] jaeger[1280] jutra[1280] zeiten[1260] uribe[1259] tunja[1255] pati[1250] 
topic63=radio[68678] fm[63157] station[44147] tv[39087] channel[38915] news[32102] television[27641] broadcasting[25322] broadcast[25274] am[23622] owned[21525] network[17681] programming[14418] format[13922] pm[13739] stations[13247] show[12132] grupo[11974] naia[11804] watts[11577] program[10676] broadcasts[10675] media[10117] mhz[10040] khz[10025] sports[9092] licensed[8948] bbc[8933] digital[8918] channels[8877] programs[8488] host[7700] cable[7612] hossein[7311] evansville[7261] hosted[7061] carries[7029] launched[6812] nuytsia[6754] paraglider[6696] purdue[6509] abc[6414] cbc[6393] aired[6384] hd[6268] facebook[6082] https[6058] utep[5974] airs[5959] talk[5836] anchor[5717] ultralight[5612] satellite[5589] tehran[5566] air[5529] broadcaster[5450] cbs[5336] coverage[5310] transmitter[5221] nbc[5189] communications[5084] qom[5045] frequency[4997] entertainment[4983] shows[4893] operated[4837] saas[4779] programme[4739] boilermakers[4695] maxi[4643] live[4476] morning[4428] roubaix[4351] ary[4260] jtbc[4256] networks[4238] call[4146] fcc[4128] cnn[4115] sold[4103] current[4102] conus[4098] sony[4082] fox[4082] daily[4067] affiliate[4050] sky[4035] moved[3962] power[3900] quiz[3890] programmes[3882] esteghlal[3875] broadcasters[3774] www[3678] hour[3648] newspaper[3621] npsl[3617] corporation[3603] sign[3589] monday[3469] 
topic64=estonian[19662] pld[18647] pts[18624] tallinn[10902] gf[9847] estonia[8625] ga[8340] sheeran[6029] pim[5582] jamaica[5159] tartu[5101] diff[4776] sanremo[3993] srinivas[3886] jamaican[3858] chekhov[3838] reggae[3643] jeeves[3430] santosh[3315] anuradha[3281] margarete[3227] ska[3050] eesti[3003] karla[2943] dmytro[2924] tatiana[2881] chemnitz[2789] luhansk[2766] ivanovich[2760] gertrud[2721] ucd[2716] meri[2693] rsfsr[2684] shelly[2498] maroons[2472] dancehall[2459] kulkarni[2425] stepan[2350] galina[2313] gorky[2302] andriy[2299] nikolay[2231] meistriliiga[2148] erdmann[2145] mikkel[2134] boland[2120] shelbourne[2105] wooster[2030] duisburg[2009] popov[1964] guelph[1931] uyezd[1930] dtv[1920] от[1889] manne[1874] nayak[1843] ceramist[1837] semyon[1825] jaan[1821] kavya[1803] decca[1783] frauen[1778] sociedad[1774] года[1729] mishra[1728] webby[1716] fyodor[1708] oriel[1692] krauss[1684] rté[1678] milly[1676] placings[1663] vogel[1653] zwickau[1642] montpelier[1632] carrasco[1619] ifa[1615] verónica[1615] styne[1592] shakib[1581] artiste[1549] bolshoi[1529] bola[1508] bernt[1471] elmo[1454] eupen[1422] paulson[1387] shakey[1383] metcalfe[1375] brașov[1367] alka[1358] grigory[1354] arkhangelsk[1352] komsomol[1347] volkov[1346] yuna[1334] raghav[1331] bassey[1329] suzana[1323] televote[1319] 
topic65=turkish[39563] turkey[30760] cypriot[16832] istanbul[15260] ankara[10336] cyprus[9680] eparchy[6940] sarajevo[5938] zmir[5443] mehmet[4432] bendigo[4323] mustafa[4304] belediyespor[4257] adana[3956] chp[3708] viic[3594] ottoman[3553] ballarat[3415] krasnodar[3327] atatürk[3302] ahmet[3287] kemal[3255] limassol[3249] izmir[3233] spor[3069] gippsland[2781] masjid[2737] alp[2693] saeed[2685] konya[2675] ashraf[2600] marmara[2591] akp[2535] yemeni[2518] goulburn[2516] geelong[2500] anatolia[2441] kara[2436] bey[2431] eskişehir[2375] stv[2359] adil[2354] sana[2354] gauchos[2324] habib[2316] aoi[2273] siddiqui[2267] tff[2209] abubakar[2188] cochin[2185] bracknell[2185] trivandrum[2149] trabzon[2139] gazi[2121] swat[2116] sudhir[2080] alam[2070] province[2070] ali[2048] balıkesir[2044] podemos[2021] homs[1996] larnaca[1995] emre[1968] anatolian[1965] yekaterinburg[1942] batumi[1925] travancore[1924] mosque[1916] faisal[1910] pasha[1906] kaya[1865] junín[1864] shabab[1859] manchukuo[1819] khalil[1803] salam[1803] shahid[1795] shaheen[1785] adel[1776] faiz[1772] anadolu[1759] mhp[1759] hasan[1758] sahel[1756] oakes[1749] lucía[1693] samsun[1691] shaikh[1674] dsb[1650] nanterre[1636] türk[1626] kayseri[1615] gulshan[1614] vizier[1609] muğla[1606] cecilie[1601] juba[1594] brookmeyer[1591] mian[1590] 
topic66=fresno[13764] greyhound[10260] javelin[7619] dragonfly[7386] steroidal[5100] fullerton[5020] nguyễn[4844] bieber[4753] dog[4131] renard[4106] rohit[4029] dogs[3619] bearcats[3106] putra[2677] fawn[2485] vibraphone[2452] hwang[2433] sinaloa[2413] văn[2347] kamala[2200] mondo[2179] stetson[2054] klamath[2018] trần[1984] tonbridge[1900] hoàng[1895] viswanath[1878] marin[1857] greyhounds[1838] hare[1833] mcgowan[1773] gotra[1742] prodrug[1725] grimes[1702] spars[1702] zedd[1701] kern[1641] nik[1580] не[1551] hawker[1537] philpott[1509] showbiz[1504] trouser[1485] bonny[1472] sykes[1471] mavis[1468] markings[1458] medico[1440] bonita[1440] csu[1438] sequoia[1434] jui[1412] hanley[1398] hamer[1398] hoosier[1386] sap[1382] stitt[1377] tailplane[1358] bland[1353] yosemite[1352] gia[1343] furlong[1340] youn[1319] grocer[1319] kjeld[1311] ngọc[1295] furies[1271] tuolumne[1267] hồ[1246] leek[1246] spar[1232] phạm[1230] plywood[1195] orb[1191] roadrunner[1191] kramer[1184] jarman[1182] pandan[1182] kokomo[1176] chaz[1160] vanna[1144] monserrat[1140] inglewood[1134] beale[1127] ultron[1118] mattress[1112] joaquin[1112] netto[1105] wylie[1105] hickman[1095] hutchings[1078] davie[1070] shimmy[1056] allman[1052] lederer[1037] kaan[1036] fearless[1035] bessie[1021] tunstall[1015] linh[1012] 
topic67=paralympics[21536] ccaa[18081] agder[12558] brentford[6189] palearctic[4950] på[4141] pinkish[3514] islet[3361] exo[3350] metalcore[3151] manuela[2935] androgenic[2906] commodores[2860] odham[2859] gotland[2810] gowda[2685] figueroa[2520] kafr[2310] loma[2138] petersen[2125] pinoy[1999] superfast[1990] shinee[1972] streaked[1967] siempre[1930] nassar[1923] mep[1921] millennials[1881] shinde[1859] bajwa[1840] michaud[1801] boyband[1801] ssp[1773] julián[1753] señor[1738] heraklion[1727] hutt[1715] quartzite[1708] voronezh[1696] sculpin[1619] egger[1612] moser[1593] muerte[1586] combinatorics[1561] eugenia[1526] dorsally[1501] öztürk[1483] pineda[1479] indica[1472] omg[1465] mcalister[1410] vermillion[1396] nunatak[1389] capsid[1377] marquez[1356] aggarwal[1333] oulu[1313] jono[1312] amigos[1303] tijuca[1300] coker[1297] rethymno[1290] farrugia[1289] highbury[1268] alcalde[1261] garay[1218] janka[1217] aves[1192] bahía[1158] carvajal[1156] inermis[1153] quintero[1149] keating[1140] sombra[1131] sobre[1122] brevard[1118] yuval[1117] bohol[1110] lua[1102] acuña[1098] mackey[1096] cen[1087] acero[1080] alon[1069] greifswald[1065] downie[1062] rosalia[1059] vamos[1048] cayley[1047] futuro[1045] fama[1035] macgyver[1016] sangre[1014] byers[1007] småland[1004] nati[990] burch[965] viralzone[964] marañón[952] ofi[948] 
topic68=bandcamp[8826] jee[7305] tibetan[5637] mixtapes[5307] wd[5126] volkswagen[5120] mercedes[4802] benz[3995] psa[3798] taichung[3758] mahindra[3749] jeon[3620] opel[3555] lama[3330] navajo[3116] romeo[3096] tibet[3057] alfa[2825] xd[2696] mbs[2675] sewanee[2646] hyundai[2572] lego[2356] jaguar[2261] volvo[2229] changchun[2197] katya[2185] bts[2152] muthu[2112] chrysler[2079] lms[2077] vyas[2055] myeon[2051] amg[2029] khao[2016] gakuin[1993] tesla[1971] lubin[1921] turbo[1887] étoile[1887] supercar[1818] hokkaidō[1799] yamagata[1771] jinja[1763] subaru[1748] rinpoche[1723] lj[1722] lamborghini[1699] wolfsburg[1693] fader[1670] aiff[1669] albarn[1667] soko[1658] ods[1658] shao[1651] mitsubishi[1643] evgeniya[1638] erc[1631] surendra[1621] stig[1617] aberdeenshire[1577] mes[1577] rst[1569] dalai[1559] chameleon[1558] suwon[1536] zou[1519] ehime[1511] jeonju[1505] kuroda[1495] sema[1473] suan[1462] cummins[1450] mino[1430] anju[1423] muay[1409] kalu[1404] tiverton[1393] tomo[1392] roxanne[1392] wg[1380] fo[1354] bap[1343] mgr[1339] wheelbase[1335] cuentos[1310] huntly[1293] hsing[1292] knowle[1281] wyeth[1278] cmc[1273] naz[1272] jairam[1270] gl[1263] ordinariate[1255] trayvon[1253] kho[1250] comer[1249] gorillaz[1249] falcone[1208] 
topic69=ireland[53962] irish[42049] dublin[25400] hockey[24202] uci[22086] usl[20943] cork[19614] blotch[13495] munster[12579] galway[12321] gaelic[12242] icelandic[12101] leinster[12005] nhl[11835] pts[11596] tipperary[11523] limerick[11309] ice[11280] gp[10632] rakyat[10524] championship[10345] senior[10234] ulster[9570] dewan[8960] kilkenny[8806] bn[8768] gaa[8541] belfast[8297] lokomotiv[8061] waterford[8016] mayo[7571] connacht[7192] townlands[7123] goaltender[7102] iceland[7009] meath[6897] barony[6702] cavan[6699] sanath[6176] umno[6126] concacaf[5986] pdl[5949] kerry[5847] ie[5659] totals[5371] bruins[5195] derry[5158] donegal[5047] clare[4937] wexford[4898] sligo[4891] greenlandic[4717] playoffs[4423] antrim[4396] offaly[4361] neill[4359] dap[4319] kildare[4247] westmeath[4239] wicklow[4226] tyrone[4212] agg[4174] longford[4088] dundalk[4066] goalkeeper[4014] schuckert[3990] ofoverall[3888] cyclo[3816] kickers[3784] whl[3680] armagh[3652] monaghan[3620] roscommon[3554] patrick[3504] keane[3439] larsson[3395] na[3384] carlow[3338] geylang[3310] bk[3301] louth[3295] persson[3289] flyers[3133] muda[3104] debutant[3077] reykjavík[3068] ik[3059] siddique[3046] kells[3038] otl[3026] antigua[3010] gf[2962] leitrim[2946] mohamad[2907] toros[2887] mac[2846] mahathir[2835] connell[2819] kampong[2819] onn[2771] 
topic70=meyrick[28883] ghana[19003] mexico[15057] stigmata[12377] mexican[11865] pakistan[10731] ghanaian[9260] arunachal[8402] karachi[7206] sindh[6834] svg[6384] méxico[5867] accra[5295] khyber[5281] pakhtunkhwa[5132] kalan[4968] aztecs[4933] manipur[4871] lahore[4811] stenoma[4497] veracruz[4447] cantonment[4338] puebla[4098] ciudad[3749] ruiz[3746] monterrey[3713] chak[3656] mujer[3620] jalisco[3617] álvarez[3529] yucatán[3462] khurd[3431] gila[3402] nagaland[3311] mizoram[3283] peshawar[3195] pima[3170] sindhi[3167] icon[3144] volta[3114] chiapas[3076] artes[3073] michoacán[3044] stabling[3033] quetta[3032] racecourse[3001] unidos[2963] chihuahua[2949] escobar[2882] paso[2881] ayala[2871] erdoğan[2845] rubio[2841] cebu[2763] sonora[2722] lès[2718] balochistan[2701] baloch[2644] belize[2624] coahuila[2619] azam[2583] valdés[2570] pavn[2533] marne[2505] maya[2503] rawalpindi[2482] bagh[2458] hacienda[2456] mahavidyalaya[2413] southgate[2410] kabaddi[2396] hidalgo[2366] gonzalez[2324] kst[2317] rawat[2273] guadalupe[2245] recep[2244] acosta[2242] morelos[2241] querétaro[2193] punjab[2175] tripura[2165] nieto[2144] guanajuato[2120] meghalaya[2083] bellas[2074] ascot[2054] raphaël[2040] guerrero[2017] matheus[1985] cárdenas[1965] mexicano[1956] salas[1951] foaled[1933] mandi[1921] renaud[1914] pumas[1912] contreras[1886] isidro[1883] pueblo[1883] 
topic71=fascia[12441] bergfelder[9107] xu[7879] spotify[4734] ferrari[4615] matsumoto[3660] mei[3586] jia[3303] sina[3183] luo[3151] ueda[2971] yamada[2967] meera[2882] suresh[2694] shimizu[2670] asom[2595] lazer[2498] lola[2497] yoshida[2425] saroja[2403] ren[2393] longhorns[2390] hana[2380] chou[2311] zhen[2300] maserati[2285] ning[2272] miki[2224] vcu[2155] aoki[2091] vk[2084] amon[2062] peuples[2028] masaki[2009] japanese[1974] dandenong[1962] vauxhall[1916] qiao[1899] prêmio[1857] nico[1843] horan[1803] midwife[1787] kenji[1763] hashimoto[1742] qiu[1733] lian[1728] midwives[1724] mercedes[1711] jt[1698] daisuke[1679] midwifery[1670] natsu[1636] inoue[1625] bu[1615] dougherty[1613] bugatti[1596] jurek[1593] cctv[1586] hj[1562] sasaki[1560] hà[1539] lifes[1515] wada[1511] masahiro[1508] jacky[1485] chiu[1475] hom[1471] toho[1469] murtagh[1468] otoko[1462] hou[1419] kon[1408] tatort[1403] elim[1372] toki[1359] takashi[1356] itō[1352] berlinale[1348] linkin[1340] brabham[1333] nishikawa[1332] satoshi[1328] takeshi[1327] ichikawa[1323] ananya[1300] aligarh[1296] yo[1294] kadokawa[1289] mahalakshmi[1287] kazuki[1281] chantal[1278] satomi[1276] dadi[1263] yoshio[1258] immortals[1257] jma[1253] nakagawa[1243] cui[1239] uchida[1233] teasers[1228] 
topic72=romanian[22300] romania[12506] bucharest[11770] mohd[7098] balu[6943] rogaland[6286] jalan[6021] esports[5266] taman[5248] iași[4728] alexandru[4554] ion[4471] moldova[4300] moldovan[4270] nicolás[4049] constantin[3836] biathlon[3800] nicolae[3741] chișinău[3174] odessa[3036] bø[2777] vasile[2746] revista[2712] manolo[2657] galician[2638] kotor[2629] nordland[2583] lugo[2503] voz[2501] nadezhda[2495] música[2472] grete[2467] carioca[2384] oficial[2271] transylvania[2270] compostela[2252] marín[2240] galați[2221] mihail[2196] perdana[2172] sevilla[2167] yeovil[2141] ediciones[2051] karlsson[2050] siti[2025] editura[2012] amador[1962] haugesund[1953] liceo[1952] din[1950] tawny[1938] pula[1924] ortiz[1923] vigo[1921] borja[1913] tanjung[1885] galicia[1849] misaki[1834] mircea[1826] bogdan[1821] filho[1811] baru[1810] franziska[1808] nag[1807] rojo[1807] duda[1804] veiga[1800] aik[1796] otero[1791] dimitrie[1783] ríos[1782] seng[1773] dato[1757] moldavia[1751] rika[1731] hikari[1712] elin[1700] montoya[1695] mota[1677] rogelio[1674] televisión[1670] katja[1664] pamplona[1663] hulu[1659] ștefan[1653] jasmin[1653] merlo[1647] ulla[1639] svendsen[1627] viața[1618] salgado[1605] nilsson[1599] kahani[1597] tiraspol[1596] roque[1590] moldavian[1578] ustad[1568] besar[1559] blanco[1557] celta[1555] 
topic73=racing[61878] race[52831] tour[43002] golf[25206] ret[24978] championship[24516] stage[23978] car[23081] driver[19581] colspan[19181] ford[16548] prix[16362] speedway[15767] motorsport[15547] sprint[15405] laps[15243] formula[14896] points[14772] chevrolet[14642] cycling[14120] road[14077] races[13845] lap[13593] motorsports[13454] championships[13411] undrafted[12674] classification[12343] px[12093] pga[12073] grand[12026] gt[11409] standings[11316] honda[11276] overall[11150] rowspan[11090] finished[11048] results[10720] trial[10558] bib[10259] nd[10120] open[10103] rd[10087] circuit[10068] cars[9856] drivers[9730] cup[9287] motorcycle[9269] motor[9223] finish[9189] rally[9181] winner[9175] riders[8832] nascar[8759] strokes[8631] pos[8620] toyota[8490] classic[8431] renault[8121] cc[8101] event[7862] bmw[7831] racer[7278] track[7090] fastest[7041] pole[7034] course[6978] wins[6966] par[6767] challenge[6562] porsche[6319] class[6291] champion[6288] km[6245] stages[6048] gp[5967] holden[5914] giro[5890] heats[5783] raced[5568] driving[5510] raceway[5507] fia[5216] nhra[5136] speed[5135] european[5107] peugeot[5065] finishing[5057] professional[5028] favourite[4995] jersey[4977] ahl[4935] wrc[4766] teams[4738] pf[4702] winning[4694] nissan[4692] audi[4664] pro[4575] pepperdine[4549] bests[4547] 
topic74=wk[10860] barron[8910] netflix[7235] quoins[6680] chloe[6259] mercer[5273] crosby[5172] mcfarland[4816] hart[4752] grimm[4567] macerata[4563] evan[4362] wnbl[4272] rhinos[4034] newman[3969] drake[3852] fiddle[3797] synths[3630] thrones[3624] yds[3618] jerome[3595] maggie[3547] showtime[3519] dexter[3507] trumpeter[3501] polydor[3457] george[3435] sundance[3425] jimmy[3355] stars[3282] nash[3221] melton[3185] hammerstein[3147] tatjana[3147] savage[3144] irving[3135] certifications[3121] gordon[3114] doll[3097] harry[3064] harold[3062] vampire[3056] jazz[3056] tenor[3037] cast[3025] clapton[3017] frank[3014] peterson[3004] arlen[2994] ballard[2985] mack[2956] mccartney[2945] tomatoes[2918] purcell[2916] mastered[2915] joe[2894] ref[2890] meghan[2859] johnny[2855] nickelodeon[2849] flanagan[2841] wook[2793] finn[2768] saxophone[2748] cw[2743] roar[2743] nme[2739] berklee[2718] jonny[2708] weir[2692] rebecca[2691] django[2678] billy[2677] carlson[2672] lucifer[2669] jack[2667] thom[2649] jill[2644] brennan[2641] teddy[2636] lennon[2623] trainor[2620] corbin[2615] maynard[2611] duggan[2577] reeves[2563] ned[2551] walton[2545] lew[2542] torrens[2539] boogie[2533] clarinets[2531] starr[2516] parker[2508] alright[2506] patterson[2480] kenton[2468] freddie[2464] jazztimes[2459] graham[2449] 
topic75=align[68629] myrtaceae[26487] weightlifting[25240] jerk[22803] snatch[22460] tbd[20073] weightlifter[13566] purplish[13000] text[11057] myrtle[8911] till[8443] bar[8019] right[7879] color[6504] powerlifting[6354] ranchi[5916] style[5700] lbs[5065] kitts[4731] width[4163] id[3906] kalpana[3860] bodybuilding[3752] nevis[3638] roundish[3621] hijo[3444] olympia[3365] alyssa[2909] longlisted[2705] ifbb[2684] oligocene[2679] value[2636] schultz[2551] arild[2450] gills[2442] figwort[2398] pettis[2318] squat[2268] godoy[2196] rubiaceae[2172] rafał[2118] legend[2117] baena[2083] naeem[2071] iwf[2052] valentín[2020] arriba[2020] boulenger[2013] kongsberg[2011] nikolov[1923] oskaloosa[1920] shibuya[1902] bakker[1884] lillie[1840] height[1809] paleocene[1784] dinos[1755] damián[1742] shortlisted[1703] rgb[1695] sattler[1643] verdugo[1615] keselowski[1596] yuichi[1577] toth[1567] cosplay[1559] minuta[1551] bodybuilder[1516] barra[1516] papi[1515] fenn[1515] lusk[1508] itis[1507] variably[1487] center[1478] kristoff[1471] shading[1467] carthy[1463] maite[1460] cornejo[1426] malu[1400] sanyal[1398] hibiscus[1396] portela[1385] cauvery[1384] teamsters[1365] velvety[1345] orientation[1333] increment[1323] girdle[1320] physique[1318] tahar[1310] megami[1294] salar[1293] seale[1274] ninjas[1247] lykke[1243] timeaxis[1226] lomond[1224] wildcards[1218] 
topic76=hurling[13052] tanzania[8050] nk[6892] kaur[5747] doha[4922] dar[4595] radiata[4578] judoka[4399] haque[4060] mehdi[3454] llb[3441] makerere[3349] bundaberg[3290] salaam[3110] amman[2958] tanzanian[2911] es[2723] bou[2442] kyu[2252] izumi[2194] yui[2190] hnk[2145] majid[2102] zanzibar[2043] gorica[2013] hamad[1981] meerut[1968] osijek[1961] kano[1845] sava[1827] reece[1805] armbar[1799] aydın[1795] juma[1764] abdulla[1763] bagrat[1709] parsonage[1674] ilija[1612] saki[1595] takeda[1570] naturelle[1475] spurius[1449] hervé[1448] anderlecht[1426] prins[1424] paras[1419] barquisimeto[1382] bamba[1347] zolder[1338] meagher[1328] koda[1303] kaneko[1301] sahil[1299] liwa[1299] miura[1298] thorp[1287] unnikrishnan[1283] haiku[1261] ippon[1237] aga[1223] baghdadi[1223] leica[1216] ragnhild[1209] orrell[1204] asst[1168] roni[1162] charleville[1162] mahmood[1146] kühne[1133] waza[1130] amalfi[1117] tori[1106] charleroi[1092] lancia[1089] grote[1080] marcelino[1078] masa[1076] suzan[1068] castroneves[1061] haren[1055] wickham[1054] melford[1052] taos[1048] tanganyika[1046] manama[1046] kyoko[1039] bronte[1036] ahsan[1035] oni[1034] silke[1007] saigo[1007] swahili[1007] insecta[994] taku[986] audun[982] pastore[975] berchem[969] sugimoto[957] umag[953] fortes[951] 
topic77=norwegian[55424] israel[37991] norway[35185] israeli[33172] oslo[22689] aviv[17411] tel[17358] hapoel[15719] jerusalem[14067] og[13351] palestinian[12041] lebanon[11778] palestine[11298] bergen[9468] fjord[8950] beirut[8821] lebanese[8499] saskatchewan[8314] thorell[8223] stavanger[7957] cfl[7828] syrian[7775] haifa[7772] jewish[7571] hebrew[6959] olav[6782] jordanian[6545] trondheim[5323] calgary[4956] hye[4907] norsk[4553] levi[4482] roughriders[4477] tuc[4284] argonauts[4249] cadastral[4198] gaza[4115] telemark[4106] wnit[4074] haddad[4002] edmonton[3949] winnipeg[3901] eskimos[3761] knut[3661] tromsø[3624] bet[3604] arab[3589] norske[3560] nrk[3540] saskatoon[3474] brampton[3465] stampeders[3463] tikva[3450] petah[3445] mads[3242] idf[3062] beit[3034] moshe[3030] cohen[3015] kirke[3010] ammonite[2994] jaffa[2963] zionist[2885] ole[2846] sham[2825] municipality[2774] regina[2736] bjørn[2734] knesset[2717] alouettes[2715] roadrunners[2687] byes[2682] assad[2655] cats[2634] kristiania[2603] kiryat[2571] arne[2511] netanya[2501] christiania[2484] paus[2467] kjell[2428] johanne[2417] inger[2411] hamas[2371] copse[2370] redblacks[2362] helge[2347] melkite[2330] sidon[2265] bhaskaran[2233] sverdlovsk[2232] iaa[2228] palestinians[2227] terje[2217] nazareth[2173] redistributed[2135] artzit[2110] storting[2099] hezbollah[2098] johansen[2096] 
topic78=ship[63684] navy[54900] ships[40412] boat[34181] naval[31140] vessel[24679] submarine[23670] dnf[22584] islands[21722] class[21198] hms[20871] fleet[18668] vessels[17600] port[17165] gun[16738] sea[16412] guns[16058] crew[15820] island[15657] launched[15257] sailing[15238] admiral[14945] boats[14825] royal[14210] lst[13678] torpedo[12938] submerged[12881] hull[12702] coast[12636] captain[12133] cargo[12081] citations[12031] maritime[12022] papua[11786] commissioned[11464] patrol[11224] shipyard[10999] sunk[10797] sailed[10565] command[10486] french[10421] ss[10338] destroyer[10195] guinea[10103] speed[9977] bay[9885] voyage[9594] laid[9510] convoy[9311] heatseekers[9289] marine[9103] type[9038] pacific[9015] keel[8934] submarines[8745] shipping[8714] tons[8526] cruiser[8484] beam[8289] fitted[8133] uss[8024] sold[7838] squadron[7756] steam[7602] sank[7434] sloop[7381] torpedoes[7240] captured[7210] admiralty[7163] ocean[7128] design[7117] ordered[7018] german[6982] harbour[6958] frigate[6929] draught[6919] deck[6894] length[6796] pounder[6741] sail[6730] arrived[6729] scrapped[6585] engines[6570] renamed[6553] gibraltar[6540] flotilla[6513] construction[6398] merchant[6345] reef[6303] yard[6277] shipbuilding[6266] armament[6248] ferry[6217] bow[6164] atlantic[6150] surface[5986] aboard[5904] transferred[5766] operation[5732] destroyers[5725] 
topic79=satellite[24459] antarctic[22725] intelsat[21189] space[14227] antarctica[13286] earthquake[11480] launch[10728] satellites[10241] storm[9803] mars[8988] utc[8736] nasa[8512] orbit[8430] earth[8387] glacier[8144] abstracting[8069] booklist[8001] crater[7995] tornado[7251] km[6914] spacecraft[6650] polar[6531] diptera[6407] rocket[6398] cyclone[6357] ocean[6335] ice[5947] damage[5802] ukr[5701] magnitude[5677] tropical[5661] geological[5595] launched[5419] scopus[5403] weather[5389] skylab[5254] mission[5242] hurricane[5034] laverne[4690] matadors[4674] expedition[4658] asc[4532] geostationary[4406] research[4405] arctic[4292] geology[4244] station[4181] krasnoyarsk[4150] atmospheric[4010] struck[3881] headland[3840] geophysical[3783] winds[3757] map[3709] iss[3512] payload[3450] greenland[3394] lunar[3314] ene[3311] spaceflight[3267] bsc[3242] climate[3184] seismic[3154] ospreys[3147] tornadoes[3127] intensity[3114] moon[3085] orbital[3076] data[3074] msc[3070] canaveral[3059] cape[3058] maps[2992] survey[2988] depth[2979] scale[2895] kg[2892] mapping[2868] band[2825] occurred[2823] farhan[2789] baia[2784] soyuz[2776] weibo[2773] impact[2763] system[2749] astronauts[2745] communications[2731] peninsula[2711] slough[2698] apollo[2697] koehler[2695] oceanography[2652] aşk[2632] scientific[2605] sandbox[2558] girija[2544] incubator[2535] энциклопедия[2533] murchison[2521] 
topic80=temple[59867] sri[29970] sinhala[23072] god[22275] gabled[20681] ancient[19881] jewish[19579] lankan[19299] hebrew[18544] text[16623] lanka[15110] rabbi[14935] buddhist[13386] inscription[13222] verse[12739] bible[12178] chapter[11930] translation[10968] goddess[10905] king[10824] translations[10112] lord[10060] greek[9957] synagogue[9736] inscriptions[9665] book[9585] isaiah[9564] deity[9469] verses[9328] temples[8916] hindu[8508] bc[8383] religious[8370] manuscript[8324] poem[8245] ad[8055] language[7919] manuscripts[7868] tomb[7829] crowdfunding[7447] yoga[7417] ceylon[7410] written[7267] mythology[7166] buddha[6920] word[6882] testament[6849] jews[6838] sarath[6786] latin[6750] sanskrit[6688] israel[6605] religion[6510] statue[6490] colombo[6385] according[6347] shrine[6277] ritual[6199] gods[6149] tradition[6072] texts[6008] vihara[5848] imma[5805] archaeological[5768] worship[5570] festival[5461] buddhism[5354] spiritual[5352] shall[5348] commentary[5334] ce[5331] maha[5283] translated[5255] extant[5234] christian[5214] son[5209] believed[5199] jerusalem[5136] sacred[5083] codex[5009] prophet[5000] means[4985] form[4958] medieval[4916] period[4838] tamil[4780] dedicated[4780] meaning[4776] origin[4754] rituals[4740] version[4697] composed[4654] jain[4625] said[4606] man[4520] deities[4471] legend[4419] divine[4383] mentioned[4356] egypt[4341] 
topic81=william[58108] sir[54551] married[53479] london[44852] son[41545] henry[37279] thomas[36992] george[36954] daughter[35766] james[34902] mary[33129] australian[31112] royal[30248] charles[30140] england[29537] australia[28183] edward[28108] wife[26559] lord[26497] robert[26450] wales[25954] elizabeth[25938] educated[24313] oxford[24305] queensland[23391] cambridge[22885] father[21559] richard[20869] sydney[20497] adelaide[20319] children[19188] margaret[18739] earl[18399] king[17868] appointed[17463] née[17395] melbourne[16513] brother[16426] mrs[16346] arthur[16328] sons[15707] walter[14458] frederick[14363] lady[14077] baron[13985] society[13963] francis[13946] victoria[13851] alexander[13740] daughters[13665] queen[13635] jane[13567] elected[13515] politician[13482] hugh[13263] smith[13119] edinburgh[13076] buried[12881] brisbane[12484] parliament[12287] samuel[12231] irish[11934] baronet[11877] whom[11820] anne[11566] hall[11395] church[11346] street[11163] david[11163] joseph[11139] sheriff[11103] aged[10936] welsh[10723] perth[10649] lived[10448] alice[10383] goble[10379] estate[10193] clerk[10133] victorian[10105] duke[10085] council[10048] ireland[10009] eldest[9837] secretary[9828] cemetery[9701] moved[9640] bibliography[9636] devon[9636] captain[9526] legislative[9512] peter[9371] councillor[9355] alfred[9347] mp[9234] ann[9184] merchant[9167] frse[9009] jones[8952] marriage[8949] 
topic82=greek[18199] greece[10392] madrasa[9325] gasser[6373] kurdistan[5984] kurdish[5903] thessaloniki[5279] hebei[4734] ef[4450] uaap[4283] bimonthly[3527] vosges[2929] nea[2649] hubei[2600] heilongjiang[2539] georgios[2348] crete[2339] pkk[2318] erzurum[2197] castletown[2149] patras[2107] burgen[2078] brito[2067] vorarlberg[2037] mosul[1873] queenstown[1863] dt[1816] kurds[1814] ioannis[1798] schenkel[1743] urmia[1656] mankato[1634] tripoli[1633] bootcamp[1623] künstler[1550] magpies[1546] bes[1531] ponomarev[1519] basra[1484] benghazi[1472] erbil[1466] nikolaos[1447] bemidji[1442] homonymous[1442] duluth[1430] keck[1413] yazidi[1408] peshmerga[1388] agus[1361] ogham[1343] corfu[1333] iza[1322] dimitrios[1318] ohno[1301] smyrna[1298] marisol[1291] nour[1290] selim[1274] leyla[1267] alona[1262] ano[1260] aegean[1253] shijiazhuang[1251] kyra[1243] jalal[1215] kermanshah[1213] wenzhou[1212] kavala[1183] cooley[1180] österreich[1176] schwab[1154] hy[1149] dara[1127] alexandrina[1106] qazi[1099] milos[1091] nabi[1080] tirol[1079] batra[1074] ioannina[1055] rolle[1044] strang[1042] magh[1042] thessaly[1031] fergus[1029] ionian[1029] scrope[1012] agios[1008] mcgregor[997] boaz[994] chios[980] peloponnese[970] achaea[965] mardin[964] tavistock[940] amrit[922] ilam[918] erdogan[903] natt[902] epirus[894] 
topic83=chinese[65519] china[63445] hong[44555] kong[42105] li[32461] malaysia[30882] taiwan[30324] wang[26335] chen[26005] zhang[22924] liu[18768] beijing[18737] taipei[16889] shanghai[16506] singapore[15356] yang[15281] lin[14596] yu[14089] wu[13459] huang[12099] taiwanese[12030] yuan[11990] wei[11893] malaysian[11797] zhou[11523] tang[10746] chan[10489] wong[10483] zhao[10269] zhu[10261] cheng[10173] lu[9929] ming[9900] penang[9763] kuala[9356] hunan[8829] chang[8738] guangzhou[8652] han[8598] yi[8494] sarawak[8485] wen[8378] lee[8194] macau[8108] asian[7974] sungai[7920] yan[7910] tan[7737] nanjing[7474] selangor[7372] ying[7257] chung[7154] qing[7053] lumpur[6970] asia[6932] zheng[6914] dynasty[6813] chu[6774] xiao[6760] brunei[6738] jiang[6719] ma[6643] guangdong[6550] jin[6488] hui[6196] mongolia[6149] fu[6119] province[6080] chun[6065] thailand[6053] wan[5962] yin[5944] liang[5929] sabah[5863] hu[5735] kampung[5635] ching[5607] republic[5529] johor[5529] zhejiang[5440] bukit[5349] kai[5313] mersin[5261] tong[5249] tai[5225] ho[5209] hua[5194] jing[5176] chiang[5144] gao[5128] chi[5122] mandarin[5090] guo[5033] kota[4906] cheung[4865] mongolian[4862] henan[4849] shi[4777] sichuan[4767] sun[4729] 
topic84=polish[93427] poland[52838] warsaw[37579] ski[23365] slalom[22392] kraków[18287] skiing[15553] vilnius[14536] alpine[13736] lithuanian[13590] cev[12782] plovdiv[11709] andrzej[11534] lithuania[11270] poznań[9815] stanisław[9656] jan[9501] downhill[8964] józef[8434] piotr[8374] voivodeship[8236] wrocław[7951] hs[7736] innsbruck[7686] michał[7616] salzburg[7599] jerzy[7184] gdańsk[7143] lublin[7106] łódź[6650] maciej[6633] skier[6586] kazimierz[6508] szczecin[6495] paweł[6494] tadeusz[6487] wisła[6464] krzysztof[6419] canoe[6298] minsk[6297] cross[6230] zindagi[6123] wojciech[5985] henryk[5978] marcin[5861] tomasz[5789] władysław[5770] polonia[5643] giant[5600] jacek[5521] łukasz[5486] aleksander[5443] linz[5437] jakub[5364] silesian[5191] winter[5008] franciszek[4958] marek[4933] austrian[4737] witold[4635] zbigniew[4633] silesia[4572] stal[4571] alps[4496] cherno[4459] karpaty[4444] stara[4439] adam[4438] polska[4424] tyrol[4369] sprint[4363] styria[4307] mountaineering[4287] karol[4248] austria[4242] lwów[4174] ghetto[4113] stefan[4084] grzegorz[4068] zagora[4058] heo[4045] cracow[4030] ewa[3981] katowice[3971] antoni[3964] ryszard[3943] agnieszka[3937] bastogne[3919] sejm[3865] poles[3857] lech[3789] jagiellonian[3782] garmisch[3772] jumping[3713] azs[3707] góra[3706] pomeranian[3698] partenkirchen[3616] ukrainian[3500] polski[3461] 
topic85=species[253309] genus[102197] mm[88244] forewings[76027] moth[70521] hindwings[67044] described[62031] wingspan[60098] dark[60072] grows[59966] grey[57691] flowers[56789] costa[56485] brown[47868] description[47822] australia[47817] whitish[47224] yellow[44206] endemic[43604] pale[42811] white[42627] marine[42097] distribution[40628] plant[39766] leaves[33514] western[31663] black[31192] dorsum[30684] base[30352] caladenia[29181] apex[28969] habitat[28895] sea[25712] length[25518] scales[25182] occurs[24995] spiders[24664] middle[24407] shell[23871] dorsal[23838] spot[23596] flowering[23298] height[22965] mollusca[22424] extinct[21458] native[21214] typically[20841] larvae[20816] tree[20709] leaf[20679] dots[20664] wide[20575] plants[20516] shaped[20310] cm[19854] taxonomy[19378] red[19219] genera[18811] erect[18331] wing[18205] basal[18054] commonly[17898] specimen[17681] colour[17561] fossil[17445] apical[17186] pink[16579] recorded[16339] gastropoda[16192] cream[16130] adults[16107] cell[15941] orange[15809] slightly[15700] green[15595] plical[15425] edge[15182] specimens[14721] reddish[14666] africa[14574] spider[14519] narrow[14509] petals[14464] regions[14411] beyond[14391] sandy[14241] flower[14159] disc[14139] southern[14036] veins[13995] males[13914] irregular[13901] light[13857] fruit[13855] contains[13801] subfamily[13588] belonging[13574] tall[13506] hairs[13499] posterior[13478] 
topic86=la[64653] spanish[52400] del[49322] italian[45906] el[45787] josé[34142] di[33055] juan[32014] spain[29987] san[27195] maría[26311] madrid[25332] división[25011] luis[24600] antonio[23787] carlos[21889] argentine[21017] argentina[20735] indonesia[19364] manuel[18463] chile[18216] buenos[18211] aires[18101] italy[16817] miguel[15970] santiago[15849] mexican[15763] pedro[15325] mexico[15097] indonesian[14908] gonzález[14717] en[14115] garcía[14097] peru[13962] los[13921] fernando[13775] alberto[13039] francisco[12990] il[12803] roberto[12585] santa[12478] rodríguez[12409] las[12264] barcelona[11820] jorge[11756] lópez[11750] fernández[11682] della[11524] nacional[11423] mario[11043] martínez[11021] rafael[10939] valencia[10862] jakarta[10799] colombia[10618] lima[10513] sánchez[10347] cruz[10137] quechua[10130] león[10080] rey[10059] franco[9779] chilean[9640] bolivia[9465] rosa[9396] pérez[9348] pablo[9323] berghahn[9126] universidad[9053] diego[9003] martín[8901] province[8881] domingo[8881] amor[8751] una[8742] sergio[8733] alejandro[8721] enrique[8679] marco[8585] basque[8549] javier[8549] cf[8516] carlo[8437] castro[8414] salvador[8359] giuseppe[8325] andrés[8102] eduardo[8100] ángel[8090] concession[8081] maria[8045] ana[8015] alfonso[7993] real[7990] toledo[7956] un[7918] córdoba[7753] paolo[7584] santo[7536] rosario[7507] 
topic87=serbian[43082] serbia[31513] albanian[25168] croatian[24905] belgrade[21130] bosnia[20467] herzegovina[18061] albania[17402] fiba[17081] croatia[16740] yugoslav[15061] futsal[14993] yugoslavia[14987] kosovo[14855] zagreb[13447] slovenian[12505] montenegro[11604] bosnian[11468] macedonia[10773] macedonian[10248] slovenia[9954] montenegrin[8559] novi[7977] nikola[7797] ita[7458] verandah[6907] ger[6524] barangay[6360] gbr[6173] serbs[6164] ljubljana[6084] vojvodina[5810] skopje[5795] swe[5750] serb[5728] luka[5671] moto[5456] jpn[5436] rijeka[5232] eurocup[5202] fra[4994] zvezda[4988] aleksandar[4686] dallara[4495] balkan[4310] marko[4310] podgorica[4289] bagan[3819] albanians[3669] ned[3502] aus[3499] warszawa[3496] republic[3474] sad[3447] ivan[3411] arg[3376] pts[3337] cze[3336] orf[3184] olimpia[3090] espanyol[3090] dušan[3009] mujeres[3003] slovene[2998] maribor[2993] croats[2991] basketball[2944] spa[2926] za[2888] vardar[2700] aut[2697] cyrillic[2682] outbuilding[2638] stefan[2607] mirna[2602] josip[2599] zoran[2552] radhika[2550] kumanovo[2537] olimpija[2508] pazar[2476] milan[2432] prvaliga[2378] pristina[2374] lazar[2346] esp[2342] republika[2333] motogp[2330] mal[2308] nll[2301] righthanded[2283] bulgaria[2276] grinstead[2218] por[2167] slobodan[2165] rsm[2120] matej[2079] mostar[2073] podium[2051] clapboards[2047] 
topic88=bulgarian[24878] serie[17571] sofia[14389] aarhus[13646] bulgaria[12981] ukrainian[12680] kyiv[8063] donetsk[7817] oblast[7628] banca[7476] kharkiv[5878] italia[5725] maccabi[5713] dynamo[5380] sakha[5275] italian[4960] ukraine[4887] brescia[4824] roma[4719] paok[4564] matchday[4522] stanbul[4501] levski[4413] coppa[4348] eintracht[4278] oleksandr[4249] varna[4194] vidyalaya[4104] спб[3993] juventus[3911] bayern[3773] milan[3744] milano[3720] superleague[3638] italy[3596] radnički[3520] di[3456] siva[3378] goalkeeper[3376] zenit[3325] venezia[3285] niš[3237] torino[3212] chernihiv[3180] dnipro[3172] concessionaire[3163] napoli[3123] coadjutor[3077] slovan[3059] dnipropetrovsk[2946] primavera[2938] fc[2891] shakhtar[2890] europa[2861] bergamo[2790] lazio[2777] tavares[2757] stadion[2740] pilipinas[2711] viljandi[2706] ateneo[2645] vicenza[2630] toto[2618] nandini[2600] pescara[2556] galatasaray[2550] tarnovo[2470] treviso[2428] sudha[2406] perugia[2398] bilal[2383] werder[2359] satyanarayana[2338] kaluga[2331] fenerbahçe[2319] verona[2308] burgas[2299] psv[2287] marini[2281] locality[2244] betis[2207] lynsey[2197] borisov[2187] loaned[2163] admira[2148] unni[2045] genoa[2004] nika[1989] hbf[1981] aleksey[1970] padova[1958] grasshopper[1952] cisterns[1947] beda[1921] fsv[1918] sturm[1901] nac[1892] viktoria[1850] cristian[1829] todor[1826] 
topic89=shrub[40829] discal[22612] orchid[15459] sepals[14596] labellum[13887] suffusion[13199] soils[12592] esperance[11960] hairy[11638] subgenus[10143] chancel[8933] epithet[8677] tinged[8459] subsp[7814] purple[7544] cloudy[6903] glabrous[6640] huskies[6014] sepal[5924] florets[5350] subspecies[5050] stamens[4884] deciduous[4798] connecticut[4663] aff[4291] herbarium[4272] geraldton[3951] sportive[3870] orchidaceae[3671] litchfield[3476] loamy[3333] creamy[3332] downwards[3100] tasmania[3036] petals[3009] glandular[2986] aisle[2713] vestry[2621] daisy[2573] petal[2545] spike[2431] acer[2247] basally[2191] borne[2156] woody[2137] conidae[2126] succulent[2126] leaflets[2100] flowered[2089] edges[2053] murugan[2049] mangrove[2040] saheb[2027] tuft[2006] raceme[2004] bracts[1999] prostrate[1965] markings[1924] bačka[1898] sacristy[1897] cones[1846] grosseto[1843] bushy[1837] woolly[1827] srikanth[1759] obovate[1746] quinnipiac[1722] konak[1696] horticulture[1680] ypg[1671] avon[1667] calyx[1652] bridgeport[1646] praveen[1596] fleshy[1556] elongate[1547] deadpool[1544] stalks[1527] petioles[1525] latif[1506] shingles[1497] trang[1471] glebe[1469] leathery[1468] shrubs[1465] bog[1435] storrs[1416] abou[1407] calcareous[1401] tapering[1399] underside[1390] fern[1358] rhizomes[1356] allium[1339] stipe[1336] solidago[1334] lobed[1317] monotypic[1312] fallujah[1304] creeper[1290] 
topic90=art[150497] museum[89238] gallery[56646] arts[54665] book[54202] painting[53902] works[53395] artist[53308] collection[46327] exhibition[45469] poetry[41290] magazine[40565] painter[39899] books[38773] novel[37909] artists[37145] paintings[34245] writer[33939] literary[31673] exhibitions[30051] literature[27964] stories[27832] award[27070] prize[26664] fiction[26242] library[26084] fine[25938] studied[25685] poems[25288] author[25180] women[25084] poet[24602] london[24271] editor[24167] worked[24116] portrait[23934] sculpture[23916] contemporary[23813] collections[23711] exhibited[23144] isbn[22700] design[21884] writing[21874] photography[21311] wrote[21009] academy[19564] fashion[19252] press[19033] paris[18530] children[18053] novels[17918] biography[17884] writers[17668] short[17372] newspaper[16980] photographer[16827] portraits[16702] publishing[16642] jpg[16028] culture[15909] publications[15543] society[15364] story[15362] journalist[15267] artistic[15011] painted[14994] awards[14916] moved[14621] cultural[14224] designer[14128] publication[13730] taught[13663] modern[13552] publisher[13397] visual[13302] sculptor[13274] institute[13104] father[12990] style[12966] studio[12544] curator[12410] edition[12387] working[12372] photographs[12337] festival[12275] paper[12265] landscape[12162] magazines[11965] edited[11927] critic[11912] created[11825] drawing[11792] novelist[11769] graphic[11756] woman[11734] translated[11734] illustrator[11673] married[11591] young[11580] creative[11571] 
topic91=ap[24037] bulldogs[12630] pac[10588] drexel[8789] intercollegiate[7965] comédie[6783] pdc[6692] sacks[6627] auburn[6497] caa[5989] sacramento[5821] byu[5297] receptions[4567] nxt[4524] collegiate[4415] bengals[4352] jaguars[4147] offense[4080] tulane[3822] tbc[3742] kickoff[3189] waived[2893] recruiting[2859] td[2846] fumbles[2757] crimson[2723] linemen[2652] scrum[2652] linebackers[2625] quarterbacks[2501] statesboro[2498] sfl[2449] estudiantes[2445] fl[2417] maly[2400] tiebreaker[2363] youngstown[2335] subba[2304] aris[2295] dartmouth[2281] tar[2270] frazione[2225] karina[2223] garonne[2187] strayhorn[2089] quintana[2066] int[2036] calle[2011] beasley[2000] tú[1974] devils[1973] mattia[1963] buckeyes[1886] suter[1871] sed[1853] sera[1849] fabienne[1832] tiebreakers[1826] selby[1808] mejor[1772] michèle[1766] reardon[1733] zimmerman[1707] sivakumar[1672] ident[1648] jusqu[1598] balestier[1557] colima[1531] defensed[1526] rereleased[1518] neves[1514] pritchard[1499] dillon[1481] meek[1474] graff[1444] heathcote[1442] poonam[1438] matías[1429] lombardi[1427] bulldog[1427] imagen[1422] brigham[1413] dl[1410] biagio[1407] sofía[1405] tomlinson[1399] valdosta[1373] folsom[1365] ubc[1355] shingo[1349] donahue[1333] dons[1315] corriere[1303] bree[1293] dodson[1292] limoges[1289] kemp[1288] aldridge[1277] ángeles[1272] stp[1268] 
topic92=india[102017] indian[74393] village[61563] fuscous[61128] pradesh[48007] singh[46439] tamil[41309] workers[36281] district[34999] population[33637] punjab[32825] rao[26056] literacy[25563] marginal[23217] sabha[23071] sri[22832] delhi[22383] uttar[22291] kerala[22142] census[21797] maharashtra[20994] nearest[20833] tehsil[20817] km[19869] telugu[19609] villages[19479] chandigarh[19217] janata[18838] kumar[18771] constituency[18709] malayalam[17984] andhra[17585] karnataka[17122] kapurthala[17007] bharatiya[16897] caste[16520] mumbai[16452] hindi[16084] raj[15043] ram[14816] bengal[14558] kannada[14537] krishna[13750] nadu[13667] rate[13484] chennai[13399] congress[12828] females[12690] prasad[12449] sharma[12367] demographics[12188] assam[12116] males[12015] male[11788] away[11787] marathi[11749] devi[11726] lok[11545] assembly[11543] airport[11437] madhya[11400] guru[11186] goa[11166] nagar[11159] punjabi[11049] bihar[11033] headquarter[10739] hyderabad[10555] bangalore[10365] average[10312] schedule[10220] legislative[10073] block[10057] female[9986] rajasthan[9956] per[9846] shankar[9651] temple[9600] streak[9574] gujarati[9488] mysore[9392] shri[9327] tribe[9268] composed[8964] shiva[8863] ratio[8626] reddy[8608] telangana[8581] bjp[8480] odisha[8356] rupees[8329] haryana[8203] ravi[8126] children[8125] labourers[7917] chandra[7879] hindu[7740] scheduled[7712] language[7470] sex[7454] 
topic93=swedish[50639] danish[37016] finnish[33286] sweden[31572] finland[25469] denmark[22607] copenhagen[19946] stockholm[19301] helsinki[12805] lagos[10783] pickard[9379] hansen[8688] nordic[8255] johan[7806] gothenburg[7090] townsville[6903] nrl[6837] greenland[6639] jensen[6353] norwegian[6167] erik[6119] anders[5858] lars[5660] norway[5538] carl[5490] andersson[5474] turku[5444] tampere[5239] henrik[5127] gustaf[5059] nielsen[5026] boca[4804] tomé[4797] magnus[4657] frederik[4541] svenska[4538] jens[4505] sven[4381] axel[4376] lahti[4268] bandy[4222] nils[4181] af[4103] hans[4019] johansson[4008] niels[3959] dansk[3941] olsen[3932] penrith[3766] warrington[3763] mineiro[3682] rochdale[3657] den[3592] pokal[3554] larsen[3527] faroe[3505] chesterfield[3489] gascoyne[3431] rasmus[3421] gunnar[3338] illawarra[3334] príncipe[3296] rotherham[3266] roosters[3217] eriksson[3172] sami[3168] accrington[3154] om[3127] bengt[3123] sofie[3054] ab[3046] colo[3008] scandinavian[2998] med[2976] leif[2975] björn[2941] en[2898] dahl[2889] svensson[2878] arne[2870] wakefield[2845] pekka[2814] primera[2788] andreas[2760] rovers[2748] ludvig[2737] helsingør[2610] ifk[2580] ratcliffe[2574] lindberg[2546] ola[2532] svalbard[2518] soares[2499] manly[2495] jul[2432] bengtsson[2401] lise[2380] göteborg[2361] åland[2352] jonas[2351] 
topic94=mf[92936] df[73533] fw[71987] aircraft[66593] airport[44520] air[42246] gk[33049] flight[29074] wing[28045] engine[25589] aviation[24055] soccerway[23381] squadron[22859] pilot[22179] cb[20868] glider[16985] pilots[16919] design[16147] cm[14986] flying[14823] airlines[14200] lb[14159] model[13945] raf[13310] rb[13139] designed[12589] weight[12431] airline[11646] airports[10365] rw[10352] force[9816] yacht[9589] lw[9506] cf[9492] fuselage[9384] missile[9305] cylinder[9148] specifications[9080] fighter[9012] span[8951] landing[8929] flights[8823] ratio[8581] range[8446] transport[8431] mandals[8343] fly[8302] crash[8224] engines[8217] undisclosed[8214] crew[7804] boeing[7737] diesel[7670] goalkeepers[7604] helicopter[7428] development[7309] aspect[7210] powered[7169] production[7105] base[7060] radar[7034] mounted[6809] training[6808] airfield[6733] operational[6578] jet[6560] free[6550] vehicle[6541] crashed[6534] flew[6529] rudder[6520] sized[6442] ourairports[6416] plane[6360] yachts[6226] passengers[6206] cells[6199] certified[6169] gear[6102] wings[6045] dm[5943] operations[5914] fuel[5799] vehicles[5713] cockpit[5700] squadrons[5646] transfer[5607] mirage[5585] fleet[5572] speed[5454] airways[5453] type[5354] zimbabwe[5347] produced[5343] accident[5306] propeller[5283] fixed[5266] rm[5203] hb[5002] tank[4995] 
topic95=fiji[6459] cotta[4755] antalya[4267] karthik[3835] sundar[3742] burundi[3621] hartlepool[3414] mcfarlane[3286] fijian[3244] maricopa[2823] agnew[2821] workington[2821] lichfield[2408] madhavan[2367] pinkney[2323] prebendary[2321] minogue[2212] broughton[2189] kylie[2188] vanuatu[2109] burundian[2094] carafa[2054] mara[1955] flo[1931] fragmenta[1895] hedley[1888] sidhu[1826] nomen[1799] ochraceous[1780] suva[1738] lidia[1728] plebs[1711] nicholls[1679] clemons[1648] burnett[1636] lucian[1628] sextus[1627] goff[1605] kupfer[1589] feroz[1538] sanju[1526] plenipotentiary[1520] kristine[1502] bde[1478] rajat[1476] spiro[1441] yeomanry[1425] jayaprakash[1397] wea[1394] eliminator[1385] sudarshan[1364] kincaid[1355] craiova[1347] wheaton[1344] waco[1334] plebeian[1334] mohabbat[1327] danforth[1321] brandi[1310] whisper[1303] elke[1289] leary[1281] decorah[1279] ako[1251] ramazan[1238] theodosia[1232] gonzales[1228] dayna[1207] philippi[1189] bahu[1181] praenomen[1151] prod[1103] paca[1061] septimus[1061] bujumbura[1058] arusha[1050] conservator[1032] crassus[1013] horváth[993] liviu[982] englisch[975] mccullough[973] frankel[964] nadi[954] solis[942] pompeius[937] tetyana[905] chisnall[893] tpb[892] cyndi[881] mikel[862] menderes[859] bodrum[853] rusk[850] harriett[849] alli[848] yoshioka[839] aurelia[828] spartacus[828] dumitrescu[822] 
topic96=greyish[20228] energy[15027] system[14604] water[13820] mm[13174] using[12704] materials[12224] sprinkled[11925] optical[11837] surface[11739] systems[11432] design[10580] model[10321] temperature[10260] process[10200] light[10087] magnetic[9542] material[9362] type[9206] low[9028] different[8533] heat[8491] method[8277] lens[8060] pressure[7897] metal[7786] carbon[7676] laser[7592] flow[7554] malware[7546] chamfered[7522] speed[7380] air[7314] test[7294] termite[7277] applications[7277] gas[7243] radiation[7238] power[7218] instrument[7200] control[7192] particles[7173] body[7109] electric[7108] production[7103] physics[7068] device[7040] developed[6959] models[6955] range[6914] technology[6690] technique[6667] vehicle[6661] signal[6656] physical[6634] designed[6600] field[6528] liquid[6495] streak[6430] machine[6402] size[6386] patent[6383] frequency[6380] effect[6366] steel[6347] devices[6297] nuclear[6236] layer[6192] standard[6158] mass[6099] electrical[6073] electron[6057] components[6043] uses[6031] plastic[6000] similar[5947] mechanical[5926] metadatabase[5903] equipment[5876] soil[5855] tube[5821] measurement[5789] techniques[5686] dive[5684] particle[5656] solar[5652] instruments[5651] polymer[5643] weight[5598] methods[5565] laboratory[5563] manufactured[5545] quantum[5516] axle[5499] plasma[5487] data[5485] beam[5470] thermal[5462] processes[5442] corrugated[5382] 
topic97=la[55765] le[54186] des[44276] du[41316] et[39682] paris[39461] les[38004] french[36498] france[23880] jean[20375] sur[17159] éditions[16258] théâtre[14794] prix[14576] en[14337] pierre[14210] saint[13256] école[11868] un[11634] ligue[11538] quebec[10960] au[10830] histoire[10790] une[10766] ou[10702] dans[9758] française[9639] michel[8844] jacques[8424] pour[8228] académie[8204] superliga[7843] françois[7586] fr[7528] henri[7457] georges[7247] algerian[7054] société[6932] français[6634] dictionnaire[6497] andré[6307] lycée[6245] claude[6059] monde[6020] est[5963] montreal[5957] nationale[5902] louis[5788] marie[5713] rue[5711] musée[5529] aux[5266] avec[5220] deux[4927] par[4889] petit[4426] siècle[4295] musique[4217] rené[4216] alain[4168] études[4164] sous[4152] pas[4150] université[4128] amour[4113] qui[4001] beaux[3974] eugène[3954] homme[3906] pincode[3810] canton[3727] montréal[3668] galerie[3667] monaco[3651] grand[3595] nouvelle[3582] algeria[3537] seine[3509] je[3502] temps[3483] bibliothèque[3483] strasbourg[3482] hôtel[3435] superdraft[3414] nuit[3398] grasset[3392] femme[3366] mer[3337] lausanne[3320] ville[3202] émile[3198] seuil[3133] hélène[3124] auguste[3083] nantes[3062] trois[3040] nouvelles[3018] paul[2992] historique[2977] palais[2966] 
topic98=taluka[17523] vijay[13273] raja[11467] babu[11436] panchayat[11055] prakash[8691] soundtrack[8529] taluk[7855] arjun[7137] joshi[6951] gujarat[6631] vijaya[6384] sahitya[6366] nrhp[6275] ramesh[6044] mukherjee[5965] nair[5932] mangalore[5329] chatterjee[5007] laterite[4830] akademi[4783] leela[4737] rajesh[4692] ganesh[4682] vikram[4629] playback[4568] sai[4481] leung[4418] rahul[4374] manoj[4303] jai[4219] deva[4187] bengaluru[4065] lam[4046] narayan[4021] pvt[3910] vinod[3873] prabhu[3839] ghosh[3805] banerjee[3794] ranga[3770] pooja[3741] mansard[3733] mahesh[3600] sanjay[3501] varma[3498] desai[3476] jaya[3396] flugelhorn[3389] wai[3364] samajwadi[3349] gopal[3294] mahendra[3254] rajya[3243] vivek[3227] anil[3210] sarkar[3185] directorial[3142] jeevan[3063] filmfare[3050] madhu[3020] abhishek[2923] rajiv[2909] bombay[2799] kala[2789] shyam[2782] lai[2719] ka[2717] aditya[2670] thomasville[2667] yuen[2667] lakh[2650] ovc[2645] rishi[2610] roy[2554] siu[2551] veena[2503] erigeron[2403] prasanna[2377] paschim[2365] cbse[2338] avengers[2314] ghats[2299] grossed[2244] rekha[2235] doordarshan[2215] ramanathan[2215] gandhi[2214] srinivasa[2213] jag[2190] kumar[2175] allahabad[2159] nikhil[2159] kolhapur[2152] dialogues[2107] kung[2068] srivastava[2065] vasantha[2042] indian[2035] shekhar[2035] 
topic99=série[23900] primera[15962] benfica[12387] spartans[9105] académica[8667] basket[7476] gd[7380] estádio[7267] sf[6919] xxx[6868] upi[6102] taça[6095] hoosiers[5591] rsssf[5387] desportivo[5068] wnba[4926] murali[4572] greek[4406] boavista[4246] bajnokság[4126] braga[3931] qatari[3817] chainsmokers[3751] trofeo[3707] rower[3459] libertadores[3216] fours[3168] trojans[3057] mx[2955] unc[2954] apertura[2901] athens[2863] supercup[2847] sg[2711] inna[2573] fiu[2538] ahly[2480] dozois[2348] guadalajara[2327] clausura[2323] zeus[2315] divisão[2280] pf[2280] xx[2239] cruzeiro[2214] bernardi[2209] ghazal[2187] argos[2151] uber[2074] pella[2063] națională[1994] unam[1929] ppg[1903] spartan[1894] zim[1853] prep[1830] mythology[1807] román[1803] pfc[1800] ue[1785] coxed[1769] aguirre[1763] araújo[1711] coxless[1702] pg[1691] substitutions[1688] trojan[1685] copa[1681] mangala[1653] scorer[1646] apollodorus[1634] ekstraklasa[1619] returner[1583] nymph[1566] mexicali[1558] vieira[1556] ap[1549] jk[1539] azul[1531] jahrhundert[1528] attica[1505] ammar[1500] sprinting[1491] goalkeeper[1485] ribeiro[1479] toluca[1472] dungeon[1471] celina[1465] starter[1463] diogo[1463] gujrat[1451] asociación[1446] ga[1443] lucero[1437] salah[1436] ucsb[1429] shihab[1425] apg[1410] maia[1405] thiem[1404] 

I find these topics for Wikipedia to be pretty good and clear topics. More data obviously gives better topics. I am still running the cohesion metrics for these for Wikipedia. Even if u_mass is supposed to be faster, it took me 4 days to run it just for the 25 topics on Wikipedia. So it would take me weeks to run it for all the 25-200 sized topic counts. If I ever finish it, maybe I will post some update.

I am sure there would be lots of interesting this there to explore via Wikipedia by increasing topic counts, looking at the relations between topics, how they evolve as the numbers increase and so on. Unfortunately, I am not paid for this and have too many other things to do..

So if I want to apply topic models, what would I do right now (NLP is getting lots of attention so who knows in a few years..)? Try a number of different topic distributions and parameters if possible, look at the models manually both in text and visually, and pick a nice configuration. Depends really if the topics are used for human consumption as such or just as some form of automated input.

If I needed to model large numbers of separate sets that are evolving over time, I might just use the cohesion metrics along with some heuristics (e.g., number of docs vs number of topics) to make automated choices, run the things as micro-services at intervals and use the results automatically. Tune as needed over time.

Fewer and more static sets might benefit from more tailored approaches.

Too long post, too much to do.

Advertisements

Giving Go a Go by forwarding some TCP

Problem? Needed to forward some TCP connections to two different locations (one stream to two destinations). Had trying out Golang on my todolist for a while. So decided to give it a Go. Previously, I have implemented a similar TCP forwarding tool in Java. Installing the full JVM to run some simple TCP forwarding seemed a bit silly. So figured I could just try having a Go at it as well.

The code I wrote can be found on Github.
To summarize, this is what it does:

  1. Open a socket to receive the initial connections to forward.
  2. When a connection is received (call it source connection) that needs to be forwarded
    • open a socket to forwarding destination
    • start a go-routine that reads from the source socket and writes to the destination socket
    • start a go-routine that reads from the destination socket and writes to the source socket
    • both of these go-routines share the same functionality:
    1. read at max N bytes into buffer
    2. write the data from buffer to destination socket
    3. if mirroring for that direction is enabled, write it also to mirror socket
    4. if logging to file is enabled, write the data to file as well

Of course, there are a number of similar Go projects out there, such as 1, 2, 3, 4, 5, etc. Not quite what I was looking for, and most importantly not invented here :). Its good to try some Go anyway.

After looking at all that, maybe the right way would be to Go with the (package? function? object? oh dear, I am lost already) TeeReader. But I used regular old buffering anyway. Naughty, I am sure, but please Go tell me why (comments etc.).
I used Jetbrains Gogland, which is a nice IDE for Go. They didn’t even pay me to advertise it, my bad.

So what did it end up looking like? What did I think about it? Did I learn anything from all this? What should I remember the next time but will surely have forgotten so I could look up here? What could you all correct me about?

The configuration “object” of mine:

//Configuration for the forwarder. Since it is capitalized, should be accessible outside package.
type Configuration struct {
	srcPort int //source where incoming connections to forward are listened to
	...
}

(WordPress claims to support Go syntax highlighting but for me it just breaks it completely so I set it to text for the snippets here)

Go does not seem to have classes or objects but uses a different more C-style structs to store data. Code is then put into a set of packages, with paths on disk defining which one you are actually referring to when importing. Surely this seems odd considering all the years of telling how great object-oriented stuffs is. But I can see how keeping things simple and setting clear conventions makes it much nicer and maybe even helps avoid people writing too many abstraction layers where not needed. And forced naming of capital start letters for visibility. Why not. Just takes some getting used to all this. Moving on.

For parsing command line arguments, Go comes with a reasonably nice looking “flag” package. But it is quite limited in not making it possible to create long and short versions of the parameter names. Also, customizing the help prints is a bit of a hassle. Maybe that is why there seem to be oh so many command line parsing libraries for Go? Like 1, 2, 3, etc.

In the end, I did not want anything hugely complicated, the external libs did not get me excited and all. So I just used the FlagSet from the Go’s stardard libs:

	flagSet := flag.NewFlagSet("goforward", flag.ExitOnError)
	flagSet.SetOutput(os.Stdout)

	//this defines an int flag "sp" with default value 0 (which is treated as "undefined")
	srcPortPtr := flagSet.Int("sp", 0,"Source port for incoming connections. Required.")
...	
	if len(os.Args) == 1 {
		fmt.Println("Usage: "+os.Args[0]+" [options]")
		fmt.Println(" Options:")
		flagSet.PrintDefaults() //this nicely prints out the help descriptions for all the args
		os.Exit(0)
	}
...	
	Config.srcPort = *srcPortPtr //getting the flag data is this simple, which is nice
...

Go also comes with a pretty nice logging package. Surprisingly it is called “log”.

My amazingly complex setup for logging to file/console at the same time:

	if Config.logFile != "" {
		f, err := os.OpenFile(Config.logFile, os.O_RDWR | os.O_CREATE | os.O_APPEND, 0666)
		if err != nil {
			//the Fatalf function exits the program after printing the error
			log.Fatalf("Failed to open log file for writing: %v", err)
		}
		if !Config.logToConsole {
			log.SetOutput(io.MultiWriter(os.Stdout, f))
		} else {
			log.SetOutput(io.MultiWriter(f))
		}
	} else {
		if Config.logToConsole {
			log.SetOutput(io.MultiWriter(os.Stdout))
		}
	}

I like the concurrency mechanism in Go. It is quite nice. But, again, requires some getting used to. Just call “go functionname” to start a thread to run that function separately. We can also call “defer statement” to have “statement” executed after the current function exits.

For example:

	listener, err := net.Listen("tcp", "localhost:"+strconv.Itoa(Config.srcPort))
	defer listener.Close()

Of course, this is also a bit confusing at the beginning. If I do:

func StartServer() {
	listener, err := net.Listen("tcp", "localhost:"+strconv.Itoa(Config.srcPort))
	defer listener.Close()
}

The StartServer function will exit immediately, and so the defer() function will be called and listener closed. From the language viewpoit, works as intended, of course, just got me first. Because it is not what I expected of my program :).

Or this:

func main() {
	forwarder.ParseConfig()
	go forwarder.StartServer()
}

What will happen when program execution starts from main()? It will start the goroutine (call StartServer in a thread). Or maybe not if it is not too fast. Because the program will exit right after the “go forwarder.StartServer()” call, and actually most likely StartServer() never runs. Because you need to block the main thread as goroutines seem to be more like daemon threads in Java, and will not keep the program running if main loop exits.

Or I can do this:

	for {
		mainConn, err := listener.Accept()
		defer mainConn.close()
		//start a new thread for this connection and wait for the next one
		go forward(mainConn)
	}

which would likely lead to resource leaking as new connections would keep getting created but never closed. Since the for loop does not exit and thus defer is not called..

So then the question, how do you do thread pooling in Go? Seems like this. Actually quite nice and simple way to get it done. Just another part that needs a different thinking. You set up some Go-routines (as in threads), have them wait on channels, pull jobs from the channels when available, and the run them in the Go-routine(s), and wait for more on the channel. Possibly return values through a channel as well.

Channels are a nice concept. But they do make for some weird looking code at when starting to Go. As do many other things actually. I guess it is the Go approach to try to be “simple” and terse. Maybe it grows on you.

Some of my weirdest moments:

Allocate a byte array of size 1024

	buf := make([]byte, 1024)

For some reason the brackets are to the left. I sometime read somewhere that Golang reads from left to right. Maybe that is why? But would it be so bad to say “a byte array” instead of “array of bytes”? At least that would not break the minds of programmers who used most of the mainstream languages out there.

Why “make”? Is it for some historical reason from C or something? Apparently there is also a keyword called “new”, and sometime somewhere someone has thought about combining these (http://stackoverflow.com/questions/9320862/go-why-would-i-make-or-new). Anyway, seems like some unnecessary mental overhead for me.

The assignment operators can be “:=” if you are declaring the variable while initializing. Otherwise it is “=”. Is this to help tell declaration from re-assignment? Or is there some other logic to it? Maybe then it makes sense. Otherwise seems like just some more special characters mixed up.

Declare a function with return value (example(https://tour.golang.org/basics/7)):

	func split(sum int) (x, y int) {

So here spit() takes an integer sum value as parameter and returns two integer values named X and Y. Again, what was wrong with the return value on the left? Same complaints as I had with the array declaration. No idea.

To create a string by concatenating a string and a number:

	"localhost" + ":" + strconv.Itoa(8080)

So you can do “localhost”+”:” for two strings. But not for numbers. What was wrong with “localhost:”+8080? Or even “localhost:”+str(8080)? It’s a small thing but seems like something that I would do often.

Documentation. I know if is fashionable to dish Java and all. But I like the approach of clearly stating in Javadocs what the parameters and return values are. Sometimes it gives way too much repetition and is just silly. But for the official libs and docs etc at least it is nice. Exerpt from the Go “io” package, the doc for WriteString (https://golang.org/pkg/io/#WriteString):

———-

func WriteString

func WriteString(w Writer, s string) (n int, err error)

WriteString writes the contents of the string s to w, which accepts a slice of bytes. If w implements a WriteString method, it is invoked directly. Otherwise, w.Write is called exactly once.

———-

OK, so what “n”, what values might “err” take and under what circumstances, etc.? I had plenty of such experiences in building my little app.

Even if there are no classes etc., there is something called an “interface”. Haven’t quite figured it out, but wanted to hack the logging a bit and had to try to figure it out.

func debuglog(msg string, v... interface{}) {
	if loggingEnabled {
		log.Printf(msg, v...)
	}
}

I guess that is some way to generally refer to whatever type is given. The “…” notation (oddle on the right…) just defines that there can be any number of arguments. And you need it both in parameter and in use. I should probably read up more on what the interface is and does, so I shall not complain too much about it.

Anyway, I could go on about the odd-ish syntax where you put lots of “_:=<-" characters around. But overall after giving Go a bit of a Go in with the TCP forwarder, I do think it is actually a quite nice language. Just takes a bit of getting used to. The concurrency related stuffs with the go-routines and channels, defers et al. are very nice.

There we Go.

Playing with Ruuvitags and Raspberries

I finally received my Ruuvitag sensors from the Kickstarter campaign a few weeks back. Everyone else seemed to get theirs months before (boohoo) but finally they did arrive. So now that I had them, what could I do with them? After a few weeks of finishing Lennu Run, I had the time to try them.

EXECUTIVE SUMMARY: If you just want the pretty pics and the link to go play Lennu Run, skip to the end. If you like pages of digressing nerd-talk, keep reading all the way.

I was mainly interested to try prototyping some “smart home” type stuffs with them. There was a post before on the Ruuvitag site about how someone had set up collecting the data into InfluxDB and visualizing with Grafana. Since that has recently been my setup for IoT/timeseries data collection and visualization, I figured that sounds like a good start. As it was implemented in Java it made it even better suited for me. Because I know it well.

So I got myself a Raspberry Pi 3 box with all the twinkies to go with it. To act as the Bluetooth hub to collect the measurements from the actual sensor tags and push into the actual database. I run InfluxDB on another host in my network, along with Grafana. So there.

Installed Raspbian on the Pi, downloaded the Ruuvi Collector code from Github. Tried to compile it and make it run. It uses some command line tools called “hcitool” and “hcidump” to collect the data from the tags (over Bluetooth Low Energy – BLE). What a weird way to do stuff – parse command line processes from a Java program. One would think proper Bluetooth support would exist in Java but then again one would think many things that are never true. If it works and is free.. My thanks.

Of course, being a nerd I just had to fork it and change it in multiple non-essential ways just to see how it works and to pretend to simplify it for myself. See Github.

Anyway. Raspbian was used in the Ruuvi Collector example as a hub, so I used it. Of course, the hci-tool versions on Raspbian are old and outdated, and the Ruuvi Collector website actually mentions you should upgrade them. OK, can’t be that hard can it? Yes it can.

Newer versions are not in the Raspbian repositories. Downloaded the sources for the packages online, after lots of messing around, finally got them to compile. Overall, a bit more complicated than I was looking for, and could not get them to work. No idea why really, and no time/resources to debug the source code. Others had some issues as well, so no go for me. Nice. I though RPi was supposed to be easy and nice way to do all this stuff for dummies like me. Obviously, I thought wrong.

Alternatives? People on the internets talk about using on “stretch” version of Raspbian which uses newer versions. Stretch seems to be some kind of a testing branch of the OS distribution. There is some mention about just taking the BLE packages from there and leaving the rest as the Jessie version (the current version of Raspbian). Others complain about potential to mess the system up. So why don’t I just upgrade my RPi to stretch as a whole? Because.

I then ran the whole Stretch upgrade to make my Raspbian upgrade fully to the stretch version. I figured all dependencies would better work and all. Haha. First off, the Raspbian desktop changed so I no longer could even find where to configure wireless connections (wifi/bluetooth). Somehow my previous configs still seemed OK as wifi worked so just SSH in and try it. “hcitool” and “hcidump” both were also installed and new enough versions. I also upgraded the kernel with rpi-update just to be sure. So am I all set? Of course not.

The hci-tools complained about not finding the BT device. So install a bunch of BT packages for Pi. No more errors. But running the command line tools, I expect they should dump BT traffic out. I see no data at all. How nice. Tried to fix that with all sorts of tricks for half a day. Then I had enough and re-installed Raspbian to the Jessie version. Found some links to instructions by the Ruuvi Collector author (scrin) on Ruuvi Slack about only downloading the bluetooth packages from the stretch repo and leaving the rest as is. After doing that, the tools were finally right versions and they actually see some data. How nice.

Some useful commands in this process:

Give the commandline tools the needed permissions (from RC site):

sudo setcap 'cap_net_raw,cap_net_admin+eip' `which hcitool`
sudo setcap 'cap_net_raw,cap_net_admin+eip' `which hcidump`

If the RuuviCollector keeps exiting with no message, try these (the collector should tell you to do so):

hcitool lescan --duplicates
hcidump --raw

Running these, you should see all sorts of live captured bluetooth data printed on the console. As I mentioned, I had issues with various versions of the hci-tools. Even if you get no errors running these two commands (hcitool and hcidump), it does not mean the hci-tools would not have issues. I initially got errors trying to run these two commands. After various fixes, they would start but not log anything. So no errors but no data either. Only after installing the Raspbian Stretch hci-tool versions on top of otherwise plain Raspbian Jessie install they started to print all the BT traffic, and I figured they were finally working.

Also, might be useful to try (just in case missing some BT stuffs from Raspbian):

sudo apt-get install pi-bluetooth

Install InfluxDB somewhere with a network connection accessible from the Pi to have a place for the data. Really very simple to do (even for me), so no big instructions here, the link is good.

To keep the Ruuvi collector running, install the “screen” command on Raspbian and create the virtual screen to keep the RC running:

sudo apt-get install screen
screen -S ruuvicollector

To get the stretch versions of the BT packages and the hci tools (yes, I used emacs):

sudo emacs /etc/apt/preferences.d/jessie.pref
Package: *
Pin: release a=jessie
Pin-Priority: 900

sudo emacs /etc/apt/preferences.d/stretch.pref
Package: *
Pin: release a=stretch
Pin-Priority: 750

sudo emacs /etc/apt/sources.list.d/jessie.list
deb http://mirrordirector.raspbian.org/raspbian/ jessie main contrib non-free rpi

sudo emacs /etc/apt/sources.list.d/stretch.list
deb http://mirrordirector.raspbian.org/raspbian/ stretch main contrib non-free rpi

sudo apt update
sudo apt install bluez -t stretch
sudo apt install bluez-hcidump -t stretch

After all this was finally running, I updated all my Ruuvi tags to the latest Weather Station firmware, set them to high-precision mode (press the B button inside the Ruuvitag so the faint red led blinks twice a second), and tagged them (with pencil on cover) with their MAC address so I know which one is which, and deployed them around the house. In the fridge, the freezer, the living room, outside, and in the sauna. The basic dashboard looks something like this:

ruuvidash

I picked up the use of grouped SingleStat panels with Sparklines for nice effect from scrin’s dashboards. (S)he also has some nice stats panels there, should look into those as well someday. There is a bunch of interesting information to be had from these measurements and dashboards by moving the tags around the places they are in, measuring temperature, humidity, pressure etc around different places in appliances and stuffs. And figuring out why the temperatures seem too high only to find there must be something wrong but not with the tag..

Perhaps more interestingly, I figured it would be nice to try something a bit different as well. So I was hoping to use the sensors not just to capture temperature, humidity and air pressure. But also use the accelometer to capture some events.

Looking at the accelerometer numbers, I couldn’t quite understand the readings at first. They were anything from 0 to 1 even if you don’t move it / accelerate at all. Again, scrin tried to explain it to me on the Ruuvi Slack. The important thing to get, I guess, is simply that the change in those values is what matters, not what the reading is so much. Or maybe sometimes it is, I just don’t get them so well. Software nerd-alert. Some graphs from playing with a tag below:

flipping

Here, the tag was initially on level surface with the bottom down as you might expect. At point 1, I flipped it upside down (belly up). This moved Z acceleration value from 1 to -1 while X and Y stayed constant. At point 2 I turned the tag on its side. At points 3 and 4 I rotated it on its side a bit each time. What do those changes mean? I have no idea! But it moves! It rotates! Ooh.

I put the fridge temperature monitor in the fridge door, on one of the door shelves. I figured maybe I could use the accelometer to capture the door movement to see when the fridge is being used and when not, or how long the door is open. By capturing the spikes in the acceleration of the sensors as the door is opened and closed. Probably the profiles would also look different as the opening usually is a bigger jerk than closing. Some graphs of the experiment below.

fridge1b

I highlighted two red areas where in the first one I was opening the door and in the second one I was closing it. You could very well set up some nice algorithms from these, and integrate them into the data-stream to get the events. But the results were not consistent. Quite often it would miss the door being opened.

After trying for a bit and asking some questions on the Ruuvi Slack, I figured it does not work quite this way. The accelerometer only captures changes in the movement speed. And even at the high-frequency setting, the sensor is only capturing the data twice a second. So if you start to pull the door open in between, nothing shows in the accelerometer graph. The tag is steadily moving before and after the initial pull, but not accelerating any more. So I guess unless the measurement polling happens just at the time of the initial pull, the curve just stays flat.

I also tried to set the sensor in the door shelf on its side, as it is round, so it would roll a bit when the door is opened and give some acceleration readings. The graph showing one such event is below. So if I managed to get the sensor to roll, the event is bigger. But again this does not happen all the time.

fridge2

What else could I play with?

I tried to take one of the tags and set it hanging from a string/ribbon. Then simulate some wind to see if it would work to measure wind speed with the accelerometer. So like the fridge door where you get the relevant temperature, humidity, and air pressure anyway and just want to play with the potential acceleration as well. See my awesome setup below.

hanging

See. At least I made it hang from a fancy ribbon. Of course I did not leave it by the door or the wall. So I let it hang freely and tried to put a fan on it. Turns out I don’t have any good fans (breaking news..). So I asked the kids to blow on it. The figure below shows the accelerometer readings for this.

puhallusdash

On mark “1” is the first bigger blow. Mark “2” is when they tried to say BOO to it, and blow on it very gently. Maybe the tag got just a bit scared? Mark “3” is the final attack and boss fight. Or something like that. Verdict. Plausible. A reference wind speed meter would be nice for tuning the algorithms for the data though.

Finally, I also thought about putting the tag in a water, leave it floating, and see if it could catch any interesting readings, such as a water drops falling on it at different rates using the accelerometer (rain). Water temperature? Or a sail boat? 🙂 See below, at least it floats. Not by much but anyway.

floater

So it all seemed good to go for a short while with a very flat line. Then the flatline got really flat, and I got no readings any more. So I figured I drowned it and it is dead. Fished it out and it started pushing readings immediately. I guess water kind of blocks radio waves and bluetooth very nicely. I am just a software nerd so what do I know. But it’s nice to play with toys. Thats what I keep telling myself..

Anyway, for illustration, see the lines below. The highlighted blue parts are where the silence happened. First time when I was testing it, second time when I put it back after to take pics. ’cause Pics or It Didn’t Happen. The general dottiness is generally just how the tag sends data, some packets don’t seem to make it or so. The other graphs I put here are just using connected mode in Grafana to bridge the gaps.

vesidash

Overall, I would say the Ruuvi tags are a nice way to meter your house and surrounding areas. Quite a bit of setup to get all this to work still. Ruuvitag was actually very easy to take into use and update. Just the Raspbian bluetooth side was a load of trouble.

The more interesting internet of things has to wait a bit still I guess. Maybe it will start with my HP printer that has this nice feature of opening up an unsecured WIFI hotspot that cannot be disabled unless you disable the whole Wifi in it. Well, the Ruuvis are only broadcasting so better on that sense at least.. Although I have no idea about the security of the Ruuvi Collector either.. 🙂

And now for something completely different. As you made it this far, go install Lennu Run for Android and iOS, play it and give it a 5-star rating! 😀

Collecting java.util.logging to log4j2

Everybody wants to write a log. And in Java everybody wants to write their own logging framework or at least use of the many different ones. Then someone comes up with logging framework framework such as SLF4J.

OK but what was I about to say. As so many times, I had a piece of Java software writing a log file using Log4J2. I was using some libs/someone elses code that uses java.util.logging to write their log. I wanted to capture those logs and include them in my Log4J2 log file for debugging, error resolution or whatever.

This case was when trying to log errors from the InfluxDB Java driver. The driver uses java.util.logging for minimal external dependencies or something. I used Log4J2 in my app.

So the usual question of how do you merge java.util.logging code, that you do not control, with your own code using Log4J2 to produce a single unified log file?

Most Googling would tell me all about SLF4J etc. I did not want yet-another framework on top of existing frameworks, and yet some more (transitive) dependencies and all sorts of weird stuff. Because I am old and naughty and don’t like too many abstractions just because.

So the code to do this with zero external dependencies.

First a log Handler object for java.util.logging to write to Log4J2:

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.util.logging.Handler;
import java.util.logging.Level;
import java.util.logging.LogRecord;

/**
* @author Daddy Bigbelly.
*/
public class JekkuHandler extends Handler {
//notice that this is the Log4J2 logger here, inside a java.util.logging Handler object
private static final Logger log = LogManager.getLogger();

  @Override
  public void publish(LogRecord record) {
    Level level = record.getLevel();
    if (level.intValue() == Level.SEVERE.intValue()) {
      log.error(record.getMessage(), record.getThrown());
    } else if (level.intValue() >= Level.INFO.intValue()) {
      log.info(record.getMessage(), record.getThrown());
    } else {
      log.debug(record.getMessage(), record.getThrown());
    }
  }

  @Override
  public void flush() {}

  @Override
  public void close() throws SecurityException {}
}

Next setting it up and using it, with the InfluxDB Java driver as an example:

import org.influxdb.InfluxDB;
import org.influxdb.InfluxDBFactory;
import org.influxdb.dto.BatchPoints;
import org.influxdb.dto.Point;
import org.influxdb.dto.Query;
import org.influxdb.impl.BatchProcessor;

import java.io.IOException;
import java.util.concurrent.TimeUnit;
import java.util.logging.ConsoleHandler;
import java.util.logging.FileHandler;
import java.util.logging.Formatter;
import java.util.logging.Handler;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.logging.SimpleFormatter;

/**
* @author Daddy Bigbelly.
*/

public class LogCaptureExample {
  public static void main(String[] args) throws Exception {
    //oh no the root password is there
    InfluxDB db = InfluxDBFactory.connect("http://myinfluxdbhost:8086", "root", "root");
    String dbName = "aTimeSeries";
    db.createDatabase(dbName);
    db.enableBatch(2000, 1, TimeUnit.SECONDS);

    //if you look at the influxdb driver code for batchprocessor, 
    //where we wanted to capture the log from, you see it using the classname to set up the logger. 
    //so we get the classname here and use it to hijack the writes for that logger (the one we want to capture)
    System.out.println(BatchProcessor.class.getName());
    Logger logger = Logger.getLogger("org.influxdb.impl.BatchProcessor");
    Handler handler = new JekkuHandler();
    logger.addHandler(handler);

    //this runs forever, but the batch mode can throw an error if the network drops.
    //so disconnect network to test this in middle of execution
    while (true) {
      Point point1 = Point.measurement("cpu")
        .time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
        .addField("idle", 90L)
        .addField("user", 9L)
        .addField("system", 1L)
        .build();
      db.write(dbName, "autogen", point1);
    }
  }
}

You could probably quite easily configure a global java.util.logger that would capture all logging written with java.util.logging this way. I did not need it so its not here.

In a similar way, you should be able to capture java.util.logging to any other log framework just by changing where the custom Handler writes the logs to.

Well there you go. Was that as exciting for you as it was for me?

Building a (Finnish) Part of Speech Tagger

I wanted to try a part of speech tagger (POS) to see if it could help me with some of the natural language processing (NLP) problems I had. This was in Finnish, although other languages would be nice to have supported for the future. So off I went, (naively) hoping that there would be some nicely documented, black-box, open-source, free, packages available. Preferably, I was looking for one in Java as I wanted to try using it as part of some other Java code. But other (programming) languages might work as well if possible to use as a service or something. Summary: There are a bunch of cool libs out there, just need to learn POS tagging and some more NLP terms to train them first…

I remembered all the stuffs on ParseMcParseFace, Syntaxnet and all those hyped Google things. It even advertises achieving 95% accuracy on Finnish POS tagging . How cool would that be. And its all about deep learning, Tensorflow, Google Engineers and all the other greatest and coolest stuff out there, right? OK, so all I need to do is go to their Github site , run some 10 steps of installing various random sounding packages, mess up my OS configs with various Python versions, settings, and all the other stuff that makes Python so great (OK lets not get upset, its a great programming language for stuffs :)). Then I just need check out the Syntaxnet git repo, run a build script for an hour or so, set up all sorts of weird stuff, and forget about a clean/clear API. OK, I pass, after messing with it too long.

So. After trying that mess, I Googled, Googled, Duckducked, and some more for some alternatives better suited for me. OpenNLP seemed nice as it is an Apache project, which have generally worked fine for me. There are a number of different models for it at SourceForge . Some of them are even POS tagger models. Many nice languages there. But no Finnish. Now, there is an option to train your own model . Which seems to require some oddly formatted, pre-tagged text sets to train. I guess that just means POS tagging is generally seen as a supervised learning problem. Which is fine, it’s just that if you are not deep in the NLP/POS tagging community, these syntaxes do look a bit odd. And I just wanted a working POS tagger, not a problem of trying to figure out what all these weird syntaxes are, or a problem of going to set up a project on Mechanical Turk or whatever to get some tagged sentences in various languages.

What else? There is a nice looking POS tagger from Stanford NLP group. It also comes with out-of-the-box models for a few languages. Again, no Finnish there either but a few European ones. Promising. After downloading it, I managed to get it to POS tag some English sentences and even do lemmatization for me (finding the dictionary base form of the word, if I interpret that term correctly). Cool, certainly useful for any future parsing and and other NLP tasks for English. They also provide some instructions for training it for new languages.

This training again requires the same pre-annotated set of training data with POS tagging. Seeing some pattern here.. See, even I can figure it out sometime. So there is actually a post on the internets, where someone describes building a Swedish POS tagger using the Stanford tagger. And another one instructing people (in comments) to downloaded the tagger code and read it to understand how to configure it. OK, not going to do that. I just wanted a POS tagger, not an excursion into some large code base to figure out some random looking parameters that require a degree in NLP to understand them. But hey, Sweden is right next to Finland, maybe I can try the configuration used for it to train my own Finnish POS tagger? What a leap of logic I have there..

I downloaded the Swedish .props file for the Stanford tagger, and now just needed the data. Which, BTW, I needed also for all the others, so I might as well have gone with the OpenNLP as well and tried that, but who would remember that anymore at this point.. The Swedish tagger post mentioned using some form of Swedish TreeBank data. So is there a similar form of Finnish TreeBank? I remember hearing that term. Sure there is. So downloaded that. Unpack the 600MB zip to get a 3.8GB text file for training. The ftb3.1.conllx file. Too large to open in most text editors. More/less to the rescue.

But hey, this is sort of like big data, which this should be all about, right? Maybe the Swedish .props file just works with it, after all, both are Treebanks (whatever that means)? The Swedish Treebank site mentions having a specific version for the Stanford parser built by some Swedish treebank visitor at Googleplex. Not so for Finnish.

Just try it. Of course the Swedish .props file wont work with the Finnish TreeBank data. So I build a Python script to parse it and format it more like the Swedish version. Words one per line, sentences separated with linefeeds. The tags seem to differ across various files around but I have no idea about how to map them over so I just leave them and hope the Stanford people have it covered. (Looking at it later, I believe they all treat it as a supervised learning problem with whatever target tags you give.)

Tried the transformed file with the Stanford POS tagger. My Python script tells me the file has about 4.4 million sentences, with about 76 million words or something like that. I give the tagger JVM 32GB memory and see if it can handle it. No. Out of memory error. Oh dear. It’s all I had. After a few minor modifications in the .props file, and I make the training data set smaller until finally at 1M sentences the tagger finishes training.

Meaning the program runs through and prints nothing  (no errors but nothing else either). There is a model file generated I can use for tagging. But I have no idea if this is any good or not, or how badly did I just train it. Most of the training parameters have a one-line description in the Javadoc, which isn’t hugely helpful  (for me). Somehow I am not too confident I managed to do it too well. Later as I did various splits on the FinnTreeBank data for my customized Java tagger and the OpenNLP tagger, I also tried this one with the 1.4M sentence test set. Got about 82% accuracy, which seems pretty poor considering everything else I talk about in the following. So I am guessing my configuration must have been really off since otherwise people have reported very good results with it. Oh well, maybe someone can throw me a better config file?

This is what running the Stanford tagger on the 1M sentence set looked like on my resource graphs:

stanford1m

So it mostly runs on a single core and uses about 20GB of RAM for the 1M sentence file. But obviously I did not get it to give me good results, so what other options do I have?

During my Googling and stuff I also ran into a post describing writing a custom POS tagger in 200 lines of Python. Sounds great, even I should be able to get 200 lines of Python, right? I translated that to Java to try it out on my data. Maybe I will call my port “LittlePOS”. Make of that what you will :). At least now I can finally figure out what the input to it should be and how to provide it, since I wrote (or translated) the code, eh?

Just to quickly recap what (I think) this does.

  • Normalize all words = lowercase words, change year numbers to “!YEAR” and other numbers to “!DIGIT”.
  • Collect statistics for each word, how often different POS tags appear for each word. A threshold of 97% is used to mark a word as “unambiguous”, meaning it can always be tagged with a specific tag if it has that tag 97% or more times in the training data. The word also needs to occur some minimum number of times (here it was 20).
  • Build a set of features for each POS tag. These are used for the “machine learning” part to learn to identify the POS tag for a word. In this case the features used were:
    • Suffix of word being tagged. So its last 3 letters in this case.
    • Prefix of word being tagged. Its first letter in this case.
    • Previous tag. The tag assigned to previous word in sentence.
    • 2nd previous tag. The tag assigned to the previous word to the previous word :).
    • Combination of the previous and previous-previous tags. So previous tag-pair.
    • The word being tagged itself.
    • Previous tag and current-word pair.
    • Previous word in sentence.
    • Suffix of previous word, its 3 last letters.
    • Previous-previous word. So back two spots in the sentence where we are tagging.
    • Next word in sentence.
    • Suffix of next word. Its 3 last letters.
    • Next-next word in sentence. So the next word after the next word. To account for the start and end of a sentence, the sentence word array is always initialized with START1, START2 and END1, END2 “synthetic words”. So these features also work even if there is no real previous or next word in the sentence. Also, word can be anything, including punctuation marks.
  • Each of the features is given a weight. This is used to calculate prediction of what POS tag a word should get based on its features in the sentence.
  • If, in training, a word is given (predicted) a wrong tag based on its features, the weights of those features for the wrong tag are reduced by 1 each, and the weights for those features for the correct tag are increase by 1 each.
  • If the tag was correctly predicted, the weights stay the same.

Getting this basic idea also helps me understand the other parsers and their parameters a bit better. I think this is what is defined by the “arch” parameter in the Stanford tagger props file, and would maybe need a better fix? I believe this setting of parameters must be one of the parts of POS tagging with the most diverse sets of possibilities as well.. Back to the Stanford tagger. It also seemed a bit slow at 50ms average tagging time per sentence, compared to the other ones I discuss in the following. Not sure what I did wrong there. But back to my Python to Java porting.

I updated my Python parser for the FinnTreeBank to produce just a file with the word and POS tag extracted and fed that LittlePOS. This still ran out of memory on the 4.4M sentences with 32GB JVM heap. But not in the training phase, only when I finally tried to save the model as a Protocol Buffers binary file. The model in memory seems to get pretty big, so I guess the protobuf generator also ran out of resources when trying to build 600MB file with all the memory allocated for the tagger training data.

In the resources graph this is what it looks like for the full 4.4M sentences:

protobuf5m_lowuse

The part on the right where the “system load” is higher and the “CPU” part looks to bounce wildly is where the protobuf is being generated. The part on the left before that is the part where the actual POS tagger training takes place. So the protobuf generation actually was running pretty long, my guess is the JVM memory was low and way too much garbage collection etc. is happening. Maybe it would have finished after few more hours but I called it a no-go and stopped it.

3M sentences finishes training fine. I use the remaining 1.4M for testing the accuracy. Meaning I use the trained tagger to predict tags for those 1.4M sentences and count how many words it tagged right in all of those. This gives me about 96.1% accuracy on using the trained tagger. Aawesome, now I have a working tagger??

The resulting model for the 3M sentence training set, when saved as a protobuf binary, is about 600MB. Seems rather large. Probably why it was failing to write it with the full 4.4M sentences. A smaller size model might be useful to make it more usable in a smaller cloud VM or something (I am poor, and cloud is expensive for bigger resources..). So I tried to train it on sentences of size 100k to 1M on 100k increments. And on 1M and 2M sentences. Results for LittlePOS are shown in the table below:

Sentences Words correct Accuracy PB Size Time/1
100k 21988662 88.7% 90MB 4.5ms
200k 22490881 90.7% 153MB 4.1ms
300k 22608641 91.2% 195MB 3.9ms
400k 22779163 91.9% 233MB 3.8ms
500k 22911452 92.4% 268MB 3.7ms
600k 23033403 92.9% 304MB 3.5ms
700k 23095784 93.1% 337MB 3.7ms
800k 23149286 93.4% 366MB 3.5ms
900k 23169125 93.4% 390MB 3.2ms
1M 23167721 93.4% 378MB 3.3ms
2M 23520297 94.8% 651MB 3.0ms
3M 23843609 96.2% 890MB 2.0ms
1M_2 23105112 93.2% 467MB ms
3M_0a 20859104 84.1% 651MB 1.7ms
3M_0b 22493702 90.7% 651MB 1.7ms

Here

  • Sentences is the number of sentences in the dataset.
  • Correct is the number of words correctly predicted. The total number of words is always 24798043 as all tests were run against the last 1.4M sentences (ones left over after taking the 3M training set).
  • Accuracy is the percentage of all predictions that it got right.
  • PB Size is the size of the model as a Protocol Buffers binary after saving to disk.
  • Time/1 is the time the tagger took on average to tag a sentence.

The line with 1M_2 shows an updated case, where I changed the training algorithm to run for 50 iterations instead of the 10 it had been set to in the Python script. Why 50? Because the Stanford and OpenNLP seem to use a default of 100 iterations and I wanted to see what difference it makes to increase the iteration count. Why not 100? Because I started it with training the 3M model for 100 iterations and looking at it, I calculated it would take a few days to run. The others were much faster so plenty of room for optimization there. I just ran it for 1M sentences and 50 iterations then, as that gives an indication of improvement just as well.

So, the improvement seems pretty much zero. In fact, the accuracy seems to have gone slightly down. Oh well. I am sure I did something wrong again. It is possible also to take the number of correctly predicted tags from the added iterations during training. The figure below illustrates this:

test

This figure shows how much of the training set the tagger got right during the training iterations. So maybe the improvement in later iterations is not that big due to the scale but it is still improving. Unfortunately, in this case, this did not seem to have a positive impact on the test set. There are also a few other points of interest in the table.

Back to the results table. The line with 3M_0a shows a case where all the features were ignored. That is, only the “unambiguous” ones were tagged there. This already gives the result of 84.1%. The most frequent tag in the remaining untagged ones is “noun”. So tagging all the remaining 15.9% as nouns gives the score in 3M_0b. In other words, if you take all the words that seem to clearly only have one tag given for them, given them that tag, and tag all the remaining ones as nouns, you get about 90.7% accuracy. I guess that would be the reference to compare against.. This score is without any fancy machine learning stuffs. Looking at this, the low score I got for training the Stanford POS tagger was really bad and I really need that for dummies guide to properly configure it.

But wait, now that I have some tagged input data and Python scripts to transform it into different formats, I could maybe just modify these scripts to give me OpenNLP compliant input data? Brilliant, lets try that. At least OpenNLP has default parameters and seems more suited for dummies like me. So on to transform my FinnTreeBank data to OpenNLP input format and run my experiments. Python script. Results below.

Sentences Words correct Accuracy PB Size Time/1
100k 22247182 89.7% 4.5MB 7.5ms
200k 22680369 91.5% 7.8MB 7.6ms
300k 22861728 92.2% 10.4MB 7.7ms
400k 22994242 92.7% 12.8MB 7.8ms
500k 23114140 93.2% 14.8MB 7.8ms
600k 23199457 93.6% 17.1MB 7.9ms
700k 23235264 93.7% 19.2MB 7.9ms
800k 23298257 94.0% 21.1MB 7.9ms
900k 23324804 94.1% 22.8MB 7.9ms
1M 23398837 94.4% 24.5MB 8.0ms
2M 23764711 95.8% 39.9MB 8.0ms
3M 24337552 98.1% 55.9MB 8.1ms
(4M) 24528432 98.9% 69MB 9.6ms
4M_2 6959169 98.5% 69MB 9.7ms
(4.4M) 24567908 99.1% 73.5MB 9.6ms

There are some special cases here:

  • (4M): This mixed training and test data in training with the first 4M of the 4.4M sentences, and then taking the last 1.4M of the 4.4M for testing. I believe in machine learning you are not supposed to test with the training data or the results will seem too good and not indicate any real world performance. Had to do it anyway, didn’t I 🙂
  • (4.4): This one used the full 4.4M sentences to train and then tested on the subset 1.4M of the same set. So its a broken test again by mixing training data and test data.
  • 4M_2: For the evaluation, this one used the remaining number of sentences after taking out the 4M training sentences. So since the total is about 4.4M, which is actually more like 4.36M, the test set here was only about 360k sentences as opposed to the other where it was 1.4M or 1.36M to be more accurate. But it is not mixing training and test data any more. Which is probably why it is slightly lower. But still an improvement so might as well train on the whole set at the end. The number of test tags here is 7066894 as opposed to the 24798043 in the 1.4M sentence test set.

And the resource use for training at 4M file size:

opennlp4m

So my 32GB of RAM is plenty, and as usual it is a single core implementation..

Next I should maybe look at putting this up as some service to call over the network. Some of these taggers actually already have support for it but anyway..

A few more points I collected on the way:

For the bigger datasets it is obviously easy to run out of memory. Looking at the code for the custom tagger trainer and the full 4.4M sentence training data, I figure I could scale this pretty high in terms of sentences processed by just storing the sentences into a document database and not in memory all at once. ElasticSearch would probably do just fine as I’ve been using it for other stuff as well. Then read the sentences from the database into memory as needed. The main reason the algorithm seems to need to keep the sentences in memory is to shuffle them randomly around for new training iterations. I could just shuffle the index numbers for sentences stored in the DB and read some smaller batches for training into memory. But I guess I am fine with my tagger for now. Similarly, the algorithm uses just a single core in training for now, but could be parallelized to process each sentence separately quite easily, making it “trivially parallel”. Would need to test the impact on accuracy though. Memory use could probably go lower using various optimizations, such as hashing the keys. Probably for both CPU and memory plenty of optimizations are possibly, but maybe I will just use OpenNLP and let someone else worry about it :).

From the results of the different runs, there seems to be some consistency in LittlePOS running faster on bigger datasets, and the OpenNLP slightly slower. The Stanford tagger seems to be quite a bit slower at 50ms, but could be again due to configuration or some other issues. OpenNLP seems to get a better accuracy than my LittlePOS, and the model files are smaller. So the tradeoff in this case would be model size vs tagging speed. The tagging speed being faster with bigger datasets seems a bit odd but maybe more of the words become “unambigous” and thus can be handled with a simple lookup on a map?

Finally, in the hopes of trying the stuff out on a completely different dataset, I tried to download the Finnish datasets for Universal Dependencies and test against those. I got this idea as the Syntaxnet stats showed using these as the test and training sets. Figured maybe it would give better results across sets taken from different sources. Unfortunately Universal Dependencies had different tag sets from the FinnTreeBank I used for training, and I ran out of motivation trying to map them together. Oh well, I just needed a POS tagger and I believe I now know enough on the topic and have a good enough starting point to look at the next steps..

But enough about that. Next, I think I will look at some more items in my NLP pipeline. Get back to that later…

Porting An Elasticsearch Java App to 5.X

Recently I was upgrading some stuff on my search app which makes use of Elasticsearch. The good people at ES had been promoting ES 5.0 for a long time as it was in beta, and now it was out of beta so I figured I might as well upgrade that as well. Did not turn out quite so simple. Some pointers from along the way.

There is a large set of breaking changes listed on their website. Only had to mess with a few. But there were a few points not clearly explained there. My experiences:

Few basic notes:

  • “string” mapping is now “keyword” or “text”. This is rather straightforward, although might take a few re-indexes.
  • “index” property in type mappings. The breaking changes list this as only supporting “true” or “false” as opposed to “analyzed” etc. from before. But the old style of “analyzed” still seems to work (at least no error). Not sure if I should investigate more but it seems to work for me still.

A bit more complicated:

Accessing fields in the Java API. I used to be able to query specific fields with something like client.prepareSearch().addFields(“a”, “b”).. and get the results from a SearchHit object by hit.getFields(). The addFields() methods are not completely gone but there is something called addStoredFields() . Which did not work on my old mappings, just returns null for the fields.

So now I need to mark my fields either as “stored” in the mapping or use source filtering to get the values. I guess in 2.X it was implicitly using source filtering. And if I mark the fields as “stored” then the addStoredFields() methods start to work.

So what is the difference between using stored fields and source filtering? The ES docs seem to discourage setting “stored” to true, but it does not always seem so clear. My understanding is that stored fields require separate reads from disk per field, whereas source filtering loads the whole document source in one go, and filters the fields from that. This can be good or bad, for example, if you have some very large content fields it may cause high overhead to just load some metadata. But if not, using stored fields might add more overhead. So it depends I guess.

I also guess this might be a good change as it makes the rationale for schema design more explicit.

Accessing dates in Java API. Using the old approach of addFields() I could access dates stored as long values of epoch milliseconds with just “long time = fields.get(“doc_date”).value()”. It does not work anymore, as apparently ES uses a different format on disk, and the source filtering just gives me the value as stored. I thought it was how ES stored it on disk (as epoch long). Not sure if it ever was so or just my assumptions. Well, the docs say something in that direction but it is a bit up to interpretation.

So to access the date as epoch long, some conversions are needed now.

Plugin API is largely changed. So if you depend on some custom plugin, you might be out of luck or you have to port the plugin yourself. I ended up porting one by myself. I found it helpful to be able to look at some examples on github. The source tree has several even if that direct link is just the Polish analyzer.

Security manager cannot be disabled. In ES 1.x, it was not used. In 2.x, it was an option to disable it. In 5.x, the ES option to disable it is removed. So if you use a plugin that needs to access JNA or some other lib that is already loaded by ES, you have to do tricks. Well, at least for the security policy you have to either unpack the ES jar file, modify the policy in it, and repack it. Or you have to modify the JRE policy file for the JRE you use to run ES with. If your plugin needs special permissions that is.. Such as loading some specific native library.

That is all I remember for now.. In a few weeks I might not remember even this much, which is why I am writing this down usually 🙂

 

 

 

 

 

Automating Deployment of my (Micro)Services

So I read somewhere on the internets that microservices are great and we should all be doing them. Being an overly enthusiastic geek to try all sorts of new fads and see how they work, I just had to go and give it a try, of course. So proceed to try to split my project into various smallish parts, connect these using GRPC and see how it all runs.

Used GRPC because I like the efficiency, documentation and simplicity of profobufs. And Google has too much of a reputation anyway. Unfortunately the GRPC generated Java code just feels weird and oddly bloated. Also had some concurrency issues, although this might be due to my lack on understanding as it seems the docs are not that great outside of Google (where you can just ask the authors or friends..).

I split my service to 10 smaller ones, did some tries and settled on a merge at 5 services. But how do I actually sensibly deploy this vs previously uploading a single dir? Then I remember the next buzzword I keep hearing about “Continous Delivery”. Sweet, that must solve it for me, right?

Um no. I must be missing something as the CD terminology seems to come up with just some conceptual level descriptions but little concrete examples of how to do it. Maybe DockerHub or some yet another hype term. But I am still not on that boat despite using various Docker images and building some myself. So what then? Most concrete reference I found seemed to be around “I has some scripts” etc. OK, whatever, so I start cooking up some scriptz. In Python.

Python ConfigParser seemed suitable. So I created a configuration file like this:

[service1]
ip=192.168.56.101
dst_dir=/home/randomguy/s/service1
src_dir=../service1
jar_prefix=s-service1
properties=s-service1.properties

[service2]
ip=192.168.56.102
dst_dir=/home/randomguy/s/service2
src_dir=../service2
jar_prefix=s-service2
properties=s-service2.properties

Read it with Python:

config = configparser.ConfigParser()
config.read(filename)

service1_ip = config['service1']['ip']
service1_dst_dir = config['service1']['dst_dir']
service1_src_dir = config['service1']['src_dir']
service1_jar_prefix = config['service1']['jar_prefix']
service1_properties = config['service1']['properties']

Doing this for all services gives the information on how to upload everything.

With the paramiko Python package installed from pip, next we are off to create the target directory if it does not exist:

def mkdir_p(ssh, remote_directory):
    with paramiko.SFTPClient.from_transport(ssh.get_transport()) as sftp:
        dir_path = str()
        for dir_folder in remote_directory.split("/"):
            if dir_folder == "":
                continue
            dir_path += r"/{0}".format(dir_folder)
            try:
                sftp.listdir(dir_path)
            except IOError:
                sftp.mkdir(dir_path)

To upload a directory recursively:

def upload_dir(ssh, localpath, remotepath, name):
    local_dirpath = os.path.join(localpath, name)
    mkdir_p(ssh, remotepath)
    with SCPClient(ssh.get_transport()) as scp:
        scp.put(local_dirpath, remotepath, recursive=True)

To upload a specific file:

def upload_file(ssh, localpath, remotepath):
    mkdir_p(ssh, remotepath)
    with SCPClient(ssh.get_transport()) as scp:
        scp.put(localpath, remotepath)

Using the information and code snippets, it is quite easy to build custom scripts for uploading specific service data to specific services. Instead of posting too much code here, I turned it into something a bit more generic and put in on Github:
https://github.com/mukatee/scp-uploader

It is the best thing since sliced bread. Of course it is…

And now you can tell me how it is really supposed to be done, thank you 🙂