Author recommendation system using page rank algorithm

Provide your research problem statement along with a list of at least 10 sources (at least 8 should be scholarly) that you plan to use in your business intelligence (BI) research paper. If you have more than 10, you can include them. Your list should include citations in proper APA format and a short description (1–2 sentences) of how or why you will use the source in your paper.

Attached Research paper keep this as reference and get some more references related to this

Don't use plagiarized sources. Get Your Custom Essay on

Just from $13/Page

Order Essay

You can select references from the references selected by this paper

A Scalable Hybrid Research Paper Recommender System for
Microso� Academic

Anshul Kanakia
Microso� Research

Redmond, Washington 980

2
ankana@microso�.com

Zhihong Shen
Microso� Research

Redmond, Washington 98052
zhihosh@microso�.com

Darrin Eide
Microso� Research

Redmond, Washington 98052
darrine@microso�.com

Kuansan Wang
Microso� Research

Redmond, Washington 98052
kuansanw@microso�.com

ABSTRACT
We present the design and methodology for the large scale hy-
brid paper recommender system used by Microso� Academic. �e
system provides recommendations for approximately 1

0 million
English research papers and patents. Our approach handles in-
complete citation information while also alleviating the cold-start
problem that o�en a�ects other recommender systems. We use the
Microso� Academic Graph (MAG), titles, and available abstracts of
research papers to build a recommendation list for all documents,
thereby combining co-citation and content based approaches. Tun-
ing system parameters also allows for blending and prioritization
of each approach which, in turn, allows us to balance paper novelty
versus authority in recommendation results. We evaluate the gen-
erated recommendations via a user study of

0 participants, with
over 2400 recommendation pairs graded and discuss the quality
of the results using P@10 and nDCG scores. We see that there is
a strong correlation between participant scores and the similarity
rankings produced by our system but that additional focus needs to
be put towards improving recommender precision, particularly for
content based recommendations. �e results of the user survey and
associated analysis scripts are made available via GitHub and the
recommendations produced by our system are available as part of
the MAG on Azure to facilitate further research and light up novel
research paper recommendation applications.

KEYWORDS
recommender system, word embedding, big data, k-means, cluster-
ing, document collection

1 INTRODUCTION AND RELATED WORK
Microso� Academic1 (MA) is a semantic search engine for academic
entities [26]. �e top level entities of the Microso� Academic Graph
(MAG) include papers and patents, �elds of study, authors, a�lia-
tions (institutions and organizations), and venues (conferences and
journals), as seen in Fig. 1. As of October, 2018, there are over 200
million total documents in the MA corpus, of which approximately
160 million are English papers or patents. �ese �gures are growing
rapidly [11].

�e focus of this article is to present the recommender system
and paper similarity computation platform for English research
1h�ps://preview.academic.microso�.com

papers developed for the MAG. We henceforth de�ne the term
‘paper’ to mean English papers and patents in the MAG, unless
otherwise noted. Research paper recommendation is a relatively
nascent �eld [

] in the broader recommender system domain but
its value to the research community cannot be overstated.

�e current approach to knowledge discovery is largely manual
— either following the citation graph of known papers or through
human curated approaches. Additionally, live feeds such as pub-
lisher RSS feeds or manually de�ned news triggers are used to gain
exposure to new research content. �is o�en leads to incomplete
literature exploration, particularly by novice researchers since they
are almost completely reliant on what they see online or their advi-
sors and peer networks for �nding new relevant papers. Following
the citation network of known papers to discover content o�en
leads to information overload, resulting in dead ends and hours
of wasted e�ort. �e current manual approaches are not scalable,
with tens of thousands of papers being published everyday and
the number of papers published globally increasing exponentially
[11, 12]. To help alleviate this problem, a number of academic
search engines have started adding recommender systems in recent
years [1,

, 18, 19, 28]. Still other research groups and independent
companies are actively producing tools to assist with research pa-
per recommendations for both knowledge discovery and citation
assistance [2, 5, 13, 17]; both these use cases being more-or-less
analogous.

Existing paper recommender systems su�er from a number of
limitations. Besides Google Scholar, Microso� Academic, Seman-
tic Scholar, Web of Science, and a handful of other players, the
vast majority of paper search engines are restricted to particular
research domains such as the PubMed database for Medicine &
Biology, and IEEE Xplore for Engineering disciplines. As such, it is
impossible for recommender systems on these �eld-speci�c search
sites to suggest cross-domain recommendations. Also, a number
of proposed user recommender systems employ collaborative �l-
tering for generating user recommendations [9]. �ese systems
su�er from the well known cold-start problem2. �ere is li�le to no
readily available data on metrics such as user a�achment rates for
research paper search and recommendation sites and so it becomes
di�cult to evaluate the e�cacy of collaborative �ltering techniques

2Some collaborative �ltering approaches assume paper authors will be system users
and use citation information to indicate user intent. Even so, new authors/users still
su�er from cold-start problem

ar
X

iv
:1

90
5.

08
88

0v
1

[
cs

.D
L

]
2

1
M

ay
2

01
9

Figure 1: �e Microso� Academic Graph (MAG) website preview and statistics as of October, 2018.

without a solid active user base. Moreover, with the introduction of
privacy legislation such as the General Data Protection Regulation
act (GDPR) in Europe, it is becoming increasing di�cult and costly
to rely on user data — which is why the Microso� Academic website
does not store personal browsing information — making collabora-
tive �ltering all the more di�cult. Finally, besides Google Scholar,
the self a�ested paper counts of other research paper databases
are in the tens of millions while the estimated number of papers
from our system (as well as Google Scholar estimates) put the to-
tal number of published papers easily in the hundreds of millions.
Having a tenth of the available research corpus can heavily dilute
recommendations, providing incomplete and unsatisfactory results.
�e MA paper recommender platform aims to alleviate some of the
aforementioned shortcomings by,

(1) employing the entire MAG citation network and interdis-
ciplinary corpus of over 200 million papers,

(2) using a combination of co-citation based and content em-
bedding based approaches to maximize recommendation
coverage over the entire corpus while circumventing the
cold-start problem,

(3) and providing the computed paper recommendations to
the broader research and engineering community so they
can be analyzed and improved by other research groups.

We present two possible interaction modes with the MA paper
recommender platform. �e �rst mode is via the “Related Papers”
section on the MA search engine paper details page. Users can
browse the MA search site using the novel semantic interpretation
engine powering the search experience to view their desired paper
[26]. �e paper details page, as seen in Fig.2, contains a tab for
browsing related papers (with dynamic �lters) that is populated us-
ing the techniques mentioned here. �e second mode of interaction
is via Microso� Azure Data Lake (ADL) services. �e entire MAG
is published under the ODC-By open data license3 and available
for use via Microso� Azure. Users can use scripting languages
such as U-SQL and python not just to access the pre-computed
paper recommendations available as part of the MAG but also to
generate on-the-�y paper recommendations for arbitrary text input

3h�ps://opendatacommons.org/licenses/by/1.0/

in a performant manner. �e means by which this functionality is
achieved is described in the following sections.

Figure 2: �e “Related Papers” tab on Microso� Academic
website paper details page with additional �lters visible on
the le� pane.

�e complexity of developing a recommendation system for MA
stems from the following feature requirements:

• Coverage: Maximize recommendation coverage over the
MAG corpus.

• Scalability: Recommendation generation needs to be done
with computation time and storage requirements in mind
as the MAG ingests tens of thousands of new papers each
week.
• Freshness: All new papers regularly ingested by the MA

data pipeline must be assigned related papers and may
2

themselves be presented as ‘related’ to papers already in
the corpus.

• User Satisfaction: �ere needs to be a balance between
authoritative recommendations versus novel recommenda-
tions so that newer papers are discoverable without com-
promising the quality of recommendations.

To tackle these requirements we have developed a hybrid recom-
mender system that uses a tunable mapping function to combine
both content-based (CB) and co-citation based (CcB) recommen-
dations to produce a �nal static list of recommendations for every
paper in the MAG. As described in [8], our approach employs a
weighted mixed hybridization approach. Our content-based ap-
proach is similar to recent work done on content embedding based
citation recommendation [5] but di�ers mainly in the fact that we
employ clustering techniques for additional speedup. Using purely
pairwise content embedding similarity for nearest neighbor search
is not viable as this is anO(n2)problem over the entire paper corpus,
which in our case would be over 2.56×1016 similarity computations.

2 RECOMMENDATION GENERATION
�e process outlined in this section describes how the paper recom-
mendations seen on the MA website, as well as those published in
the MAG Azure database, are generated. More information on the
MAG Azure database is available online4 but the important thing
to note is that our entire paper recommendation dataset is openly
available as part of this database, if desired.

�e recommender system uses a hybrid approach of CcB and
CB recommendations. �e CcB recommendations show a high
positive correlation with user generated scores, as discussed in
section 3, and so we consider them to be of high quality. But
the CcB approach su�ers from low coverage since paper citation
information is o�en di�cult to acquire. To combat this issue we
also use content embedding similarity based recommendations.
While CB recommendations can be computationally expensive,
and of lower quality than CcB recommendations they have the
major advantages of freshness and coverage. Only paper metadata
including titles, keywords and available abstracts are needed to
generate these recommendations. Since all papers in the MAG are
required to have a title, we can generate content embeddings for
all English documents in the corpus, just relying on the title if need
be.

�e resulting recommendation lists of both approaches are �nally
combined using a parameterized weighting function (see section
2.3) which allows us to directly compare each recommendation
pair from either list and join both lists to get a �nal paper recom-
mendation list for nearly every paper in the graph. �ese lists are
dynamically updated as new information comes into the MAG, such
as new papers, paper references or new paper metadata such as
abstracts. �e recommendation lists are thus kept fresh from week
to week.

2.1 Co-citation Based Recommendations
Consider a corpus of papers P = {p1,p2,p3 . . .pn}. We use ci,j = 1
to denote pi is citing pj , 0 otherwise. �e co-citation count between

4h�ps://docs.microso�.com/en-us/academic-services/graph/

pi and pj is de�ned as:

cci,j =
n∑

k=1
ci,kcj,k (1)

When ci,j ≥ 1, we call that pi is a co-citation of pj and vice-versa.
Notice, cci,j = ccj,i . �is method presumes that papers with higher
co-citation counts are more related to each other. Alternatively
if two papers are o�en cited together, they are more likely to be
related. �is approach of recommendation generation is not new,
it was originally proposed in 1973 by Small et al. [27]. Since then
others such as [14] have built upon the original approach by incorpo-
rating novel similarity measures. While we stay true to the original
approach in this paper, we are investigating other co-citation and
network based similarity measures as future work.

CcB similarity empirically resembles human behavior when it
comes to searching for similar or relevant material. Having access
to paper reference information is a requirement for generating CcB
recommendations. �is presents a challenge since reference lists
are o�en incomplete or just unavailable for many papers, particu-
larly for older papers that were published in print but not digitally.
Moreover, CcB recommendations are biased towards older papers
with more citations by their very nature. CcB recommendations
prove ine�ective for new papers that do not have any citations
yet and therefore cannot have any co-citations. �e MAG contains
complete or partial reference information for 31.19% of papers, with
each paper averaging approx. 20 references. As a result, only about
32.5% of papers have at least one co-citation. Nevertheless, CcB
recommendations tend to be of high quality and therefore cannot
be overlooked.

2.2 Content Based Recommendations
2.2.1 Generating paper embeddings. CB recommendations pro-

vide a few crucial bene�ts to overcome the information limitations
from CcB recommendations as well as the privacy concerns, system
complexity, and cold-start problem inherent in other user based
recommender systems approaches such as collaborative �ltering.
CB recommendations only require metadata about a paper that is
easy to �nd such as its title, keywords, and abstract. We use this
textual data from the MAG to generate word embeddings using
the well known word2vec library [22, 23]. �ese word embeddings
are then combined to form paper embeddings which can then be
directly compared using established similarity metrics like cosine
similarity [21].

Each paper is vectorized into a paper embedding by �rst training
word embeddings, w, using the word2vec library. �e parameters
used in word2vec training are provided in Table. 1 �e training data
for word2vec are the titles, and abstracts of all English papers in
the MA corpus. At the same time, we compute the term frequency
vectors for each paper (TF) as well as the inverse document fre-
quency (IDF) for each term in the training set. A normalized linear
combination of word vectors weighted by their respective TF-IDF
values is used to generate a paper embedding, D. Terms in titles and
keywords are weighed twice as much terms in abstracts, as seen
in Eqn. 2. �is approach has been applied before for CB document
embedding generation, as seen in [24], where the authors assigned

a 3× weight to words in paper titles compared to words in the ab-
stract. Finally, the document embedding is normalized, D̂ = D/|D|.
Since we use cosine similarity as a measure of document relevance
[21] embedding normalization makes the similarity computation
step more e�cient since the norm of each paper embedding does
not need to computed every time it is compared to other papers,
just the value of the dot product between embeddings is su�cient.

D = 2.
∑

w∈title
w∈keywords

TFIDF(w).ŵ +
∑

w∈abstract
TFIDF(w).ŵ (2)

2.2.2 Clustering paper embeddings. Our major contribution to
the approach of CB recommendation is improving scalability using
clustering for very large datasets. �e idea to use spherical k-means
clustering for clustering large text datasets is presented in [10]. �e
authors of [10] do a fantastic job of explaining the inherent proper-
ties of document clusters formed and make theoretical claims about
concept decompositions generated using this approach. Besides
the aforementioned bene�ts, our desire to use k-means clustering
stems from the speedup it provides over traditional nearest neigh-
bor search in the continuous paper embedding space. We utilize
spherical k-means clustering [15] to drastically reduce the number
of pairwise similarity computations between papers when generat-
ing recommendation lists. Using trained clusters drops the cost of
paper recommendation generation from a O(n2) operation in the
total number of papers to a O(n ∗ (|c |+λc)) operation, where |c | is
the trained cluster count and λc is the average size of a single paper
cluster. Other possible techniques commonly used in CB paper
recommender system optimization, such as model classi�ers [4],
and trained neural networks [5] were investigated but ultimately
rejected due to the their considerable memory and computation
footprint compared to the simpler k-means clustering approach,
particularly for the very large corpus size we are dealing with.

Cluster centroids are initialized with the help of the MAG topic
stamping algorithm[25]. Papers in the MAG are stamped according
to a learned topic hierarchy resulting in a directed acyclic graph
of topics, such that every topic has at least one parent, with the
root being the parent or ancestor of every single �eld of study. As
of October, 2018 there are 229,370 topics in the MAG but when
the centroids for clustering were originally generated — almost a
year ago — there were about 80,000 topics. Of these, 23,533 were
topics with no children in the hierarchy, making them the most
focused or narrow topics with minimal overlap to other �elds. �e
topic stamping algorithm also assigns a con�dence value to the
topics stamped for each paper. By using papers stamped with a high
con�dence leaf node topic we can guess an initial cluster count as
well as generate initial centroids for these clusters. �erefore, our
initial centroid count k = 23,533. Cluster centroids are initialized
by taking at most 1000 random papers for each leaf node topic and
averaging their respective paper embeddings together. Another
initialization mechanism would be to take descriptive text (such
as Wikipedia entries) for each leaf node topic and generate an
embedding – in much the same way paper embeddings are generated
using trained word embeddings – to act as an initial cluster centroid.
�e centroid initialization method choice is ultimately dependent
on the perceived quality of the �nal clusters and we found that
averaging paper embeddings for leaf node topics provided su�cient

k-means
init. clusters (k) max iter. (n) min error (δ)

23,533 10 10−3

word2vec

method emb. size loss fn.
skipgram 256 ns

window size max iter. min-count cuto�
10 10 10

sample negative
10−5 10

Table 1: Parameters used for spherical k-means clustering
and word2vec training.

centroids for initializing clusters. �e rest of the hyper parameters
used for k-means clustering are provided in Table. 1.

K-means clustering then progresses as usual until the clusters
converge or we complete a certain number of training epochs. Clus-
ter sizes range anywhere from 51 papers to just over 300,000 with
93% of all papers in the MAG belonging to clusters of size 35,000
or less. �e distribution of cluster sizes is important since we do
not want very large clusters to dominate computation time when
generating CB recommendations. Remember that clustering re-
duces computation complexity of CB recommendation generation
from O(n2) to O(n ∗ (|c |+λc)) because each paper embedding now
need only be compared to the embeddings of other papers in the
same cluster. Here |c | = k and λc is the average cluster size. For a
single paper the complexity of recommendation generation is just
the second term, i.e. |c |+λci where ci is the size of the cluster-i
that the paper belongs to so if λci gets too large then it dominates
the computation time. In our pipeline we found that the largest 100
clusters ranged in size from about 40,000 to 300,000 and took up
more than 40% of the total computation time of the recommenda-
tion process. We therefore limit the cluster sizes that we generate
recommendation for to 35,000. For now, papers belonging to clus-
ters larger than this threshold (about 7% of all papers) only have
CcB recommendation lists generated. In the future, we plan on
investigating sub-clustering or hierarchical clustering techniques
to break up the very large clusters so as to be able to generate CB
recommendations for them as well.

2.3 Combined Recommendations
Finally, both CcB and CB candidate sets for a paper are combined to
create a uni�ed �nal set of recommendations for papers in the MAG.
CcB candidate sets have co-citation counts associated with each
paper-recommendation pair. �ese co-citation counts are mapped
to a score between (0,1) to make it possible to directly compare
them with the CB similarity metric. �e mapping function used is
a modi�ed logistic function as seen in Eq. 3.

σ
(
cci,j

)
=

1
1+eθ(τ−cci,j)

(3)

θ and τ are tunable parameters for controlling the slope and o�set
of the logistic sigmoid, respectively. Typically, these values can be
estimated using the mean and variance of the domain distribution
or input distribution to this function, under the assumption that
the input distribution is Gaussian-like. While the distribution of co-
occurrence counts tended to be more of a Poisson-like distribution
with a long tail, the majority of co-occurrence counts (the mass

of the distribution) was su�ciently Gaussian-like. We se�led on
values τ = 5 and θ = 0.4 based on the mean co-occurrence count
of all papers and a factor of standard deviation, respectively. In
general, changing the tunable parameters of Eq. 3 allows one to
weigh CcB versus CB recommendations resulting in di�erent �nal
sets of recommendation lists that balance authoritative recommen-
dations from CcB method with novel recommendations from the
CB method. Once CcB and CB recommendation similarities are
mapped to the same range [0,1] comparing them becomes trivial
and generating a uni�ed recommendation list involves ordering
relevant papers from both lists just based on their similarity to the
target paper.

3 USER STUDY
We evaluated the results of the recommender system via an online
survey. �e survey was set up as follows. On each page of the study,
participants were presented with a pair of papers. �e top paper
was one that had been identi�ed in the MAG as being authored (or
co-authored) by the survey participant while the bo�om paper was
a recommendation generated using the hybrid recommender plat-
form described in the previous section. Metadata for both papers as
well as hyperlinks to the paper details page on the MA website were
also presented to the participant on this page. �e participant was
then asked to judge — on a scale of 1 to 5, with 1 being not relevant
to 5 being very relevant — whether they thought the bo�om paper
was relevant to the top paper (See Fig. 3). Participants could decide
to skip a survey page if they were not comfortable providing a score
and carry on with the remainder of the survey.

�e dataset of paper/recommended-paper pairs to show for a
particular participant were generated randomly selecting at most
5 of that participant’s authored papers. �is was done to ensure
familiarity with the papers the participants were asked to grade,
which we thought would make the survey less time consuming
thereby resulting in a higher response rate. Note that while par-
ticipants were guaranteed to be authors of the papers, they may
not have been authors of the recommended papers. For each of a
participant’s 5 papers, we then generated at most 10 recommen-
dations using the CcB approach, and 10 recommendations using
the CB approach. Some newer papers may have had fewer than
10 co-citation recommendations. �is resulted in each participant
having to rate at most 100 recommendations. All participants were
active computer science researchers and so the survey, as a whole,
was heavily biased towards rating computer science papers. We
wish to extend this survey to other domains as future work since
the MAG contains papers and recommendations from tens of thou-
sands of di�erent research domains. For now, we limited our scope
to computer science due to familiarity, the ease of participant access
and con�dence in participant expertise in this domain.

4 RESULTS AND DISCUSSION
�e user survey was sent to all full-time researchers at Microso�
Research and a total of 40 users responded to the survey, result-
ing in 2409 scored recommendation pairs collected. Of these, 984
were CcB recommendations, 15 recommendation pairs that were
both content and co-citation, and 1410 CB recommendation pairs.

�e raw result dataset is available on GitHub5. Since at most 10
recommendations were presented to a user using each of the two
methods, we computed P@10 for CcB and CB recommendations as
well as P@10 for combined recommendations.

Since we did not include any type of explicit score normalization
for participants during the survey, Table 2 shows precision com-
puted assuming, both a user score of at least 3 as a true positive
result and another row assuming a score of at least 4 as a true
positive result. Recall that users were asked to score recommended
paper pairs on a scale of 1 being not relevant to 5 being most rele-
vant. We also compute the normalized discounted cumulative gain
(nDCG) for each of the three methods. Note that we use exponen-
tial gain, (2score − 1)/loд2(rank + 1) instead of linear gain when
computing DCG.

CcB CB Combined

P@10-3 0.315 0.226 0.533

P@10-4 0.271 0.145 0.41

nDCG 0.851 0.789 0.891
Table 2: Evaluation metrics for CcB, CB and combined rec-
ommender methods. P@10-N indicates that a user score at
least N between [1,5] is considered a true positive.

We generated Fig. 4 to see how the similarities computed us-
ing the MA recommender system lined up with user scores from
the study. Each bar in the �gure is generated by �rst aggregating
the similarity scores for all paper recommendation pairs that were
given a particular score by users. �e �rst column is all paper rec-
ommendation pairs given a score of 1 and so on. Each column is
then divided into sections by binning the paper recommendation
pairs according to the similarity computed using the combined
recommender method, e.g. �e orange section of the le�most bar
denotes all paper recommendation pairs with a similarity between
[0.6,0.7) that were given a score of 1 by the user study participants.
While the absolute values of the similarity is not very important,
what is important to gather from this �gure is that it shows a clear
positive correlation between the similarity values computed by the
hybrid recommender platform and user scores. �is would seem
to indicate that recommendation pairs with higher computed simi-
larity are more likely to be relevant for users, which is the desired
outcome for any recommender system. �is fact is reinforced by
the nDCG values of each of the recommender methods. A com-
bined nDCG of 0.891 indicates a strong correlation between the
system’s computed rankings and those observed from the survey
participants.

On the other hand, the observed precision values indicate that
there is room for improving user satisfaction of the presented rec-
ommendations, particularly for the CB method. While we expected
CcB recommendations to outperform CB results, having CB recom-
mendations be half as precise as CcB recommendations would seem
to indicate that additional e�ort needs to be spent in improving CB
recommender quality.

5h�ps://github.com/akanakia/microso�-academic-paper-recommender-user-study

Figure 3: A screenshot of the online recommender system survey.

Figure 4: �e distribution of similarity ranges over user re-
sponses ranging from 1 being not relevant to 5 being most
relevant.

5 CONCLUSION
A natural question to ask about this approach is why not use other
vectorization techniques such as doc2vec[20], fas�ext[6] or even
incorporate deep learning language models such as ULMFit[16]?
While we se�led on word2vec for the current production system,
we are constantly evaluating other techniques and have this task
set aside as future work. A good experiment would be to gener-
ate a family of embeddings and analyze recommendation results,
perhaps with the aid of a follow-up user study to understand the
impact of di�erent document vectorization techniques on the re-
sult recommendation set. Another avenue for further research lies
in tuning the weights and hyper-parameters of the recommender,
such as the θ and τ parameters in the co-citation mapping function
(Eq. 3). We hypothesize that a reinforcement learning approach
could be used to learn these parameters, given the user study as
labeled ground-truth data for the training model.

In the broader scope of evaluating research paper recommender
systems, there is a notable lack of literature that compares existing
deployed technologies. Furthermore, there is a general lack of data
on metrics such as user adoption and satisfaction, and no consensus
on which approaches — like the content and co-citation hybrid pre-
sented in this paper, or collaborative �ltering, and graph analysis to
name a few others — prove the most promising in helping to tackle
this problem. Part of the reason for this is that large knowledge
graphs and associated recommender systems are o�en restricted
behind paywalls, not open access or open source and hence di�-
cult to analyze and compare. We hope to at least partly alleviate
this problem by providing the entire MAG and precomputed paper
recommendations under the open data license, ODC-By, so that
other researchers may easily use our data and reproduce the results
presented here as well as conduct their own research and analysis
on our knowledge graph.

In conclusion, we presented a scalable hybrid paper recom-
mender platform used by Microso� Academic that used co-citation
and content based recommendations in maximize coverage, scal-
ability, freshness, and user satisfaction. We examined the quality
of results produced by the system via a user study and showed a
strong correlation between our system’s computed similarities and
user scores for pairs of paper recommendations. Finally, we made
the results of our user study as well the actual recommendation lists
used by MA available to researchers to analyze and help further
research in research paper recommender systems.

REFERENCES
[1] Luiz Barroso. 2006. Exploring the scholarly neighborhood [Blog Post]. h�ps:

//googleblog.blogspot.com/2006/08/exploring-scholarly-neighborhood.html.
[2] Joeran Beel, Akiko Aizawa, Corinna Breitinger, and Bela Gipp. 2017. Mr. DLib:

recommendations-as-a-service (RaaS) for academia. In Digital Libraries (JCDL),
2017 ACM/IEEE Joint Conference on. IEEE, 1–2.

[3] Joeran Beel, Bela Gipp, Stefan Langer, and Corinna Breitinger. 2016. Research-
paper recommender systems: a literature survey. International Journal on Digital
Libraries 17, 4 (Nov. 2016), 305–338.

[4] Steven Bethard and Dan Jurafsky. 2010. Who should I cite: learning literature
search models from citation behavior. In Proceedings of the 19th ACM international
conference on Information and knowledge management. 609–618.

https://googleblog.blogspot.com/2006/08/exploring-scholarly-neighborhood.html

[5] Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar. 2018.
Content-Based Citation Recommendation. In NAACL HLT 2018: 16th Annual
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies. 238–251.

[6] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. En-
riching word vectors with subword information. arXiv preprint arXiv:1607.04606
(2016).

[7] Kurt D Bollacker, Steve Lawrence, and C Lee Giles. 1999. A system for automatic
personalized tracking of scienti�c literature on the web. In Proceedings of the
fourth ACM conference on Digital libraries. ACM, 105–113.

[8] Robin D. Burke. 2002. Hybrid Recommender Systems: Survey and Experiments.
User Modeling and User-adapted Interaction 12, 4 (2002), 331–370.

[9] Tsung Teng Chen and Maria R. Lee. 2018. Research Paper Recommender Systems
on Big Scholarly Data.. In PKAW. 251–260.

[10] Inderjit S. Dhillon and Dharmendra S. Modha. 2001. Concept Decompositions for
Large Sparse Text Data Using Clustering. Machine Learning 42 (2001), 143–175.

[11] Yuxiao Dong, Hao Ma, Zhihong Shen, and Kuansan Wang. 2017. A Century of
Science: Globalization of Scienti�c Collaborations, Citations, and Innovations.
In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. 1437–1446.

[12] Santo Fortunato, Carl T. Bergstrom, Katy Borner, James A. Evans, Dirk Helbing,
Stasa Milojevic, Alexander M. Petersen, Filippo Radicchi, Roberta Sinatra, Brian
Uzzi, Alessandro Vespignani, Ludo Waltman, Dashun Wang, and Albert Laszlo
Barabasi. 2018. Science of science. Science 359, 6379 (2018).

[13] Bela Gipp, Jran Beel, and Christian Hentschel. 2009. Scienstein : A Research
Paper Recommender System. ICETiC�09 : International Conference on Emerging
Trends in Computing (2009), 309–315.

[14] Khalid Haruna, Maizatul Akmar Ismail, Abdullahi Ba�a Bichi, Victor Chang,
Sutrisna Wibawa, and Tutut Herawan. 2018. A Citation-Based Recommender
System for Scholarly Paper Recommendation.. In International Conference on
Computational Science and Its Applications. 514–525.

[15] Kurt Hornik, Ingo Feinerer, and Martin Kober. 2012. Spherical k-Means Cluster-
ing. Journal of Statistical So�ware 50, 1 (2012), 1–22.

[16] Jeremy Howard and Sebastian Ruder. 2018. Universal language model �ne-tuning
for text classi�cation. In Proceedings of the 56th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 328–339.

[17] Wenyi Huang, Zhaohui Wu, Prasenjit Mitra, and C Lee Giles. 2014. Refseer: A
citation recommendation system. In Digital Libraries (JCDL), 2014 IEEE/ACM
Joint Conference on. IEEE, 371–374.

[18] Kris Jack. 2012. Mahout Becomes a Researcher: Large Scale Rec-
ommendations at Mendeley. h�ps://www.slideshare.net/KrisJack/
mahout-becomes-a-researcher-large-scale-recommendations-at-mendeley.

[19] Ajith Kodakateri Pudhiyaveetil, Susan Gauch, Hiep Luong, and Josh Eno. 2009.
Conceptual recommender system for CiteSeerX. In Proceedings of the third ACM
conference on Recommender systems. ACM, 241–244.

[20] �oc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences
and Documents. international conference on machine learning (2014), 1188–1196.

[21] Maake Benard Magara, Sunday O. Ojo, and Tranos Zuva. 2018. A comparative
analysis of text similarity measures and algorithms in research paper recom-
mender systems. In 2018 Conference on Information Communications Technology
and Society (ICTAS).

[22] Tomas Mikolov, Kai Chen, Greg S. Corrado, and Je�rey Dean. 2013. E�cient Esti-
mation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781
(2013).

[23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je� Dean. 2013.
Distributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems. 3111–3119.

[24] Cristiano Nascimento, Alberto H.F. Laender, Altigran S. da Silva, and Mar-
cos Andr Gonalves. 2011. A source independent framework for research paper
recommendation. ACM Press, 297.

[25] Zhihong Shen, Hao Ma, and Kuansan Wang. 2018. A web-scale system for
scienti�c knowledge exploration. meeting of the association for computational
linguistics (2018).

[26] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Paul Hsu,
and Kuansan Wang. 2015. An Overview of Microso� Academic Service (MAS)
and Applications. In Proceedings of the 24th International Conference on World
Wide Web. 243–246.

[27] Henry Small. 1973. Co-citation in the scienti�c literature: A new measure of
the relationship between two documents. Journal of �e American Society for
Information Science 24, 4 (July 1973), 265–269.

[28] Roberto Torres, Sean M. McNee, Mara Abel, Joseph A. Konstan, and John Riedl.
2004. Enhancing digital libraries with TechLens+. In Proceedings of the 4th
ACM/IEEE-CS joint conference on Digital libraries. ACM, 228–236.

Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley from Kris Jack

Abstract
1 Introduction and Related Work
2 Recommendation Generation

2.1 Co-citation Based Recommendations
2.2 Content Based Recommendations
2.3 Combined Recommendations

3 User Study
4 Results and Discussion
5 Conclusion
References

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

Order Now Talk to Us

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

Most Qualified Writer $10FREE
Plagiarism Scan Report $10FREE
Unlimited Revisions $08FREE
Paper Formatting $05FREE
Cover Page $05FREE
Referencing & Bibliography $10FREE
Dedicated User Area $08FREE
24/7 Order Tracking $05FREE
Periodic Email Alerts $05FREE

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

On-time Delivery
24/7 Order Tracking
Access to Authentic Sources

Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

Call Us +1 (877) 657-8180 Discuss Order Details Now

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

Clear elicitation of your requirements.
Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

Proactive analysis of your writing.
Active communication to understand requirements.

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

Thorough research and analysis for every order.
Deliverance of reliable writing service to improve your grades.

Place an Order Start Chat Now

Author recommendation system using page rank algorithm

What Will You Get?

Premium Quality

Experienced Writers

On-Time Delivery

24/7 Customer Support

Complete Confidentiality

Authentic Sources

Moneyback Guarantee

Order Tracking

Areas of Expertise

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

Preferred Writer

Grammar Check Report

One Page Summary

Plagiarism Report

Free Features $66FREE

Our Services

Academic Writing

Professional Editing

Thorough Proofreading

Thorough Proofreading

Delegate Your Challenging Writing Tasks to Experienced Professionals

Check Out Our Sample Work

It May Not Be Much, but It’s Honest Work!

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

Share Your Requirements

Place Order & Deposit Funds

Release Payment to Your Writer

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We Mirror Your Guidelines to Deliver Quality Services

We Handle Your Writing Tasks to Ensure Excellent Grades