paragraphs = htl.html`<div class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-bN97Pc-haAclf"><h2 class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-tJHJj" tabindex="0">Page 1 of 8</h2><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 28.4314%; top: 12.2475%; width: 43.1373%; height: 1.76768%;">Reconstruction of Threaded Conversations
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 35.4575%; top: 14.2677%; width: 29.085%; height: 1.76768%;">in Online Discussion Forums
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 28.5948%; top: 18.0556%; width: 42.8105%; height: 1.76768%;">Erik Aumayr and Jeffrey Chan and Conor Hayes
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 37.9085%; top: 19.697%; width: 24.183%; height: 1.38889%;">Digital Enterprise Research Institute
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 42.9739%; top: 20.9596%; width: 14.0523%; height: 1.51515%;">NUI Galway, Ireland
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 32.0261%; top: 22.3485%; width: 35.9477%; height: 1.64141%;">Email: {erik.aumayr, jkc.chan, conor.hayes}@deri.org
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 25%; top: 27.5253%; width: 6.53595%; height: 1.26263%;">Abstract
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 29.6717%; width: 36.1111%; height: 1.38889%;">Online discussion boards, or Internet forums, are a significant
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 30.9343%; width: 36.1111%; height: 2.65152%;">part of the Internet. People use Internet forums to post ques-
tions, provide advice and participate in discussions. These
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 33.4596%; width: 36.1111%; height: 3.91414%;">online conversations are represented as threads, and the con-
versation trees within these threads are important in under-
standing the behaviour of online users. Unfortunately, the
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 37.2475%; width: 36.1111%; height: 1.38889%;">reply structures of these threads are generally not publicly
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 38.5101%; width: 36.1111%; height: 2.65152%;">accessible or not maintained. Hence, in this paper, we in-
troduce an efficient and simple approach to reconstruct the
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 41.0354%; width: 36.1111%; height: 2.65152%;">reply structure in threaded conversations. We contrast its ac-
curacy against three baseline algorithms, and show that our
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 43.5606%; width: 36.1111%; height: 2.65152%;">algorithm can accurately recreate the in and out degree dis-
tributions of forum reply graphs built from the reconstructed
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 46.0859%; width: 9.80392%; height: 1.38889%;">reply structures.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 21.4052%; top: 49.2424%; width: 13.8889%; height: 1.51515%;">1 Introduction
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 51.2626%; width: 39.3791%; height: 1.51515%;">Internet forums are an important part of the Web. Along
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 52.6515%; width: 39.3791%; height: 1.38889%;">with Twitter, web logs and wikis, they provide a platform for
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 53.9141%; width: 39.3791%; height: 4.29293%;">questions to be asked and answered, information to be dis-
seminated and public discussions on all types of topics. Ac-
cording to Internet Brands1, 11% of Internet traffic in 2009
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 58.0808%; width: 39.3791%; height: 1.51515%;">consists of visits to online forums, showing forums are still
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 59.4697%; width: 17.9739%; height: 1.51515%;">an integral part of the Web.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 10.2941%; top: 60.8586%; width: 37.9085%; height: 1.51515%;">In forums, conversations are represented as sequences of
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 62.2475%; width: 39.3791%; height: 1.51515%;">posts, or threads, where the posts reply to one or more earlier
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 63.6364%; width: 39.3791%; height: 1.51515%;">posts. For example, Figure 1 shows a thread from the poker
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 65.0253%; width: 39.3791%; height: 1.51515%;">forum on www.boards.ie. It consists of a sequence of posts
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 66.4141%; width: 39.2157%; height: 1.51515%;">discussing how to become a Texas Hold’em poker dealer.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 67.803%; width: 36.2745%; height: 1.51515%;">Links exist between posts if one is the reply of another.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 69.1919%; width: 39.3791%; height: 5.68182%;">The threaded nature of forums allows us to follow the con-
versations, and thus study interesting problems. For exam-
ple, users can be profiled and analysed based on their reply-
ing behaviour, which is extracted from the reply structure
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 74.7475%; width: 39.3791%; height: 1.51515%;">of forums. In (Chan, Daly, and Hayes 2010), users were
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 76.1364%; width: 39.3791%; height: 1.51515%;">profiled using this method, then grouped together into user
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 77.5253%; width: 39.3791%; height: 1.26263%;">roles of common behaviour. The roles were then used to
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 78.9141%; width: 39.3791%; height: 2.90404%;">decompose forums into percentage of users playing partic-
ular roles. Another sample application is in topic and trend
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 81.6919%; width: 39.3791%; height: 1.51515%;">tracking (Allan 2002). By recovering the reply structure, we
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 83.0808%; width: 39.3791%; height: 1.38889%;">can follow the actual conversation stream in threads, which
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 85.101%; width: 39.3791%; height: 1.51515%;">Copyright c 2011, Association for the Advancement of Artificial
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 8.66013%; top: 86.4899%; width: 28.7582%; height: 1.89394%;">Intelligence (www.aaai.org). All rights reserved. 1
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 11.2745%; top: 87.7525%; width: 25.9804%; height: 1.26263%;">http://www.internetbrands.com
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 27.5253%; width: 39.3791%; height: 1.51515%;">might not be in the order the posts are posted. As it can be
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 52.1242%; top: 28.9141%; width: 38.7255%; height: 1.51515%;">seen, the reply structure of threads have many applications.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 30.4293%; width: 39.3791%; height: 2.77778%;">There are many forums and many datasets of forums on-
line. However, the reply structure of threads is not always
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 33.0808%; width: 39.3791%; height: 1.51515%;">available. This can be due to the failure of the board system
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 34.4697%; width: 39.5425%; height: 1.51515%;">to properly log them, it is not maintained by the providers, it
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 35.8586%; width: 39.3791%; height: 1.51515%;">is not publicly available or even lost. Therefore in this paper,
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 37.2475%; width: 39.3791%; height: 1.51515%;">we propose a new method to reconstruct the reply structure
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 38.6364%; width: 12.5817%; height: 1.51515%;">of posts in forums.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 40.1515%; width: 37.7451%; height: 1.51515%;">Prior work in reconstructing the thread structure is limited
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 41.5404%; width: 39.3791%; height: 2.90404%;">(Wang et al. 2008). They focus on either detecting ques-
tion and answers in forums (Cong et al. 2008), which is
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 44.3182%; width: 39.3791%; height: 4.29293%;">only one type of thread in online forums, or only use con-
tent to reconstruct thread structure, which results in low ac-
curacy (Wang et al. 2008). We propose a new approach
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 48.4848%; width: 39.3791%; height: 1.51515%;">to reconstructing thread structures. It uses a set of simple
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 49.8737%; width: 39.3791%; height: 1.51515%;">features and a classifier (a decision tree) to reconstruct the
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 51.2626%; width: 39.3791%; height: 1.51515%;">reply structure of threads. We evaluate the accuracy of the
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 52.6515%; width: 39.3791%; height: 2.90404%;">algorithm against the existing and a baseline algorithm us-
ing traditional notions of precision and recall and the ability
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 55.4293%; width: 39.3791%; height: 2.90404%;">of the algorithms to recreate the in and out degree distribu-
tions of reply graphs built from the reconstructed replying
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 52.1242%; top: 58.2071%; width: 39.2157%; height: 1.51515%;">structure. We also analyse how well the algorithms perform
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 59.596%; width: 39.3791%; height: 1.51515%;">in recreating the local in and out degrees and the clustering
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 60.9848%; width: 34.9673%; height: 1.51515%;">coefficient for individual vertices in the reply graphs.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 62.5%; width: 30.8824%; height: 1.51515%;">In summary, the contributions of this work are:
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 52.1242%; top: 64.5202%; width: 39.2157%; height: 1.51515%;">• Proposal of a classification approach to reconstruct reply
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 65.9091%; width: 37.9085%; height: 1.51515%;">behaviours in threads, that uses content and non-content
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 67.298%; width: 5.88235%; height: 1.26263%;">features.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 52.1242%; top: 69.4444%; width: 39.2157%; height: 1.51515%;">• Show that the algorithm can accurately recreate the in and
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 70.8333%; width: 37.7451%; height: 1.51515%;">out degree distributions of the forum reply graphs that are
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 72.2222%; width: 31.0458%; height: 1.51515%;">created from the reconstructed reply structures.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 52.1242%; top: 74.3687%; width: 39.2157%; height: 1.51515%;">• Show that the difference in accuracy of our algorithm and
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 75.7576%; width: 37.7451%; height: 1.51515%;">a baseline algorithm result in significant differences in the
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 77.1465%; width: 37.7451%; height: 1.51515%;">local degree and clustering coefficient values of the reply
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 78.5354%; width: 9.47712%; height: 1.51515%;">forum graphs.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 53.5948%; top: 80.6818%; width: 37.7451%; height: 1.51515%;">The remainder of this paper is as follows. In Section 2
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 82.0707%; width: 39.3791%; height: 1.51515%;">we describe related work, then we explain our approach to
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 83.4596%; width: 39.3791%; height: 1.51515%;">reconstructing threaded conversations in Section 3. In the
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 84.8485%; width: 39.3791%; height: 1.51515%;">next section, we present our evaluate and contrast the results
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 86.2374%; width: 39.3791%; height: 1.51515%;">of the different approaches. Finally, Section 5 concludes this
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 51.9608%; top: 87.6263%; width: 26.634%; height: 1.51515%;">paper and presents possible future work.
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 49.0196%; top: 95.4545%; width: 1.96078%; height: 1.13636%;">26
</p><p class="ndfHFb-c4YZDc-cYSp0e-DARUcf-Df1ZY-eEGnhe" style="left: 27.1242%; top: 2.14646%; width: 45.7516%; height: 1.38889%;">Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media
</p></div>`