<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cornelius Puschmann&#039;s Blog &#187; Code</title>
	<atom:link href="http://blog.ynada.com/category/code/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.ynada.com</link>
	<description>My new blog on Linguistics, Digital Humanities and Scholarly Communication on the Internet</description>
	<lastBuildDate>Wed, 18 Jan 2012 17:54:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Graphing Twitter friends/followers with R (updated yet again)</title>
		<link>http://blog.ynada.com/864</link>
		<comments>http://blog.ynada.com/864#comments</comments>
		<pubDate>Thu, 22 Dec 2011 10:12:46 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rstats]]></category>
		<category><![CDATA[sna]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=864</guid>
		<description><![CDATA[Those of you following my occasional updates here know that I have previously posted code for graphing Twitter friend/follower networks using R (post #1. post #2). Kai Heinrich was kind enough to send me some updated code for doing so using a newer version of the extremely useful twitteR package. His very crisp, yet thoroughly [...]]]></description>
			<content:encoded><![CDATA[<p>Those of you following my occasional updates here know that I have previously posted code for graphing Twitter friend/follower networks using <a href="http://www.r-project.org/">R</a> (<a href="http://blog.ynada.com/247">post #1</a>. <a href="http://blog.ynada.com/279">post #2</a>). <a href="http://tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_wirtschaftswissenschaften/wi/wiid/professur/wiid_heinrich">Kai Heinrich</a> was kind enough to send me some updated code for doing so using a newer version of the extremely useful <a href="http://cran.r-project.org/web/packages/twitteR/">twitteR</a> package. His very  crisp, yet thoroughly documented script is pasted below.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=864&amp;download=graph_friendsfollowers.R">graph_friendsfollowers.R</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p8642"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
</pre></td><td class="code" id="p864code2"><pre class="r" style="font-family:monospace;"># Script for graphing Twitter friends/followers
# by Kai Heinrich (kai.heinrich@mailbox.tu-dresden.de) 
&nbsp;
# load the required packages
&nbsp;
library(&quot;twitteR&quot;)
library(&quot;igraph&quot;)
&nbsp;
# HINT: In order for the tkplot() function to work on mac you need to install 
#       the TCL/TK build for X11 
#       (get it here: http://cran.us.r-project.org/bin/macosx/tools/)
#
# Get User Information with twitteR function getUSer(), 
#  instead of using ur name you can do this with any other username as well 
&nbsp;
start&lt;-getUser(&quot;YOUR_USERNAME&quot;) 
&nbsp;
# Get Friends and Follower names with first fetching IDs (getFollowerIDs(),getFriendIDs()) 
and then looking up the names (lookupUsers()) 
&nbsp;
friends.object&lt;-lookupUsers(start$getFriendIDs())
follower.object&lt;-lookupUsers(start$getFollowerIDs())
&nbsp;
# Retrieve the names of your friends and followers from the friend
# and follower objects. You can limit the number of friends and followers by adjusting the 
# size of the selected data with [1:n], where n is the number of followers/friends 
# that you want to visualize. If you do not put in the expression the maximum number of 
# friends and/or followers will be visualized.
&nbsp;
n&lt;-20 
friends &lt;- sapply(friends.object[1:n],name)
followers &lt;- sapply(followers.object[1:n],name)
&nbsp;
# Create a data frame that relates friends and followers to you for expression in the graph
relations &lt;- merge(data.frame(User='YOUR_NAME', Follower=friends), 
data.frame(User=followers, Follower='YOUR_NAME'), all=T)
&nbsp;
# Create graph from relations.
g &lt;- graph.data.frame(relations, directed = T)
&nbsp;
# Assign labels to the graph (=people's names)
V(g)$label &lt;- V(g)$name
&nbsp;
# Plot the graph using plot() or tkplot(). Remember the HINT at the 
# beginning if you are using MAC OS/X
tkplot(g)</pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/864/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Tiny snippet of Python code to extract the friends and followers of a given user</title>
		<link>http://blog.ynada.com/784</link>
		<comments>http://blog.ynada.com/784#comments</comments>
		<pubDate>Wed, 21 Sep 2011 19:50:57 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[methods]]></category>
		<category><![CDATA[social data]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=784</guid>
		<description><![CDATA[Ahead of publishing my TwitterFunctions library of R code (which is constant work in progress) I thought I&#8217;d put up some really short Python code for getting a person&#8217;s friends and followers. Both scripts rely on Tweepy, my favorite Python implementation of the Twitter API. Install Python (works on Windows as well, not just on [...]]]></description>
			<content:encoded><![CDATA[<p>Ahead of publishing my TwitterFunctions library of R code (which is constant work in progress) I thought I&#8217;d put up some really short Python code for getting a person&#8217;s friends and followers. Both scripts rely on <a href="http://code.google.com/p/tweepy/">Tweepy</a>, my favorite Python implementation of the <a href="https://dev.twitter.com/">Twitter API</a>. Install <a href="http://www.python.org/">Python</a> (works on Windows as well, not just on Mac/Linux) and then Tweepy on top of that and you are good to go with these two scripts, which can be executed from the command line with<br />
<code>python get_friends.py <i>username</i></code></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=784&amp;download=get_friends.py">get_friends.py</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p7845"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p784code5"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #ff7700;font-weight:bold;">import</span> tweepy
&nbsp;
<span style="color: #dc143c;">user</span> = <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> friend <span style="color: #ff7700;font-weight:bold;">in</span> tweepy.<span style="color: black;">api</span>.<span style="color: black;">friends</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">print</span> friend.<span style="color: black;">screen_name</span></pre></td></tr></table></div>


<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=784&amp;download=get_followers.py">get_followers.py</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p7846"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p784code6"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #ff7700;font-weight:bold;">import</span> tweepy
&nbsp;
<span style="color: #dc143c;">user</span> = <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> follower <span style="color: #ff7700;font-weight:bold;">in</span> tweepy.<span style="color: black;">api</span>.<span style="color: black;">followers</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">print</span> follower.<span style="color: black;">screen_name</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/784/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Visualizing China&#8217;s internet growth with R and the Google Visualization API</title>
		<link>http://blog.ynada.com/226</link>
		<comments>http://blog.ynada.com/226#comments</comments>
		<pubDate>Tue, 14 Jun 2011 14:38:33 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[global internet usage]]></category>
		<category><![CDATA[googlevis]]></category>
		<category><![CDATA[rstats]]></category>
		<category><![CDATA[UN]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=226</guid>
		<description><![CDATA[I&#8217;ve been following the development of googleVis, the implementation of the Google Visualization API for R, for a bit now. The library has a lot of potential as a bridge between R (where data processing happens) and HTML (where presentation is [increasingly] happening). A growing number of visualization frameworks are on the market and all [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been following the development of <a href="http://cran.r-project.org/web/packages/googleVis/index.html">googleVis</a>, the implementation of the <a href="http://code.google.com/apis/chart/">Google Visualization API</a> for <a href="http://www.r-project.org/">R</a>, for a bit now. The library has a lot of potential as a bridge between R (where data processing happens) and HTML (where presentation is [increasingly] happening). A growing number of visualization frameworks are on the market and all have their perks (e.g. <a href="http://www-958.ibm.com/">Many Eyes</a>, <a href="http://simile.mit.edu/">Simile</a>, <a href="http://flare.prefuse.org/">Flare</a>). I guess I was inspired in such a way by the Hans Rosling <del datetime="2011-06-14T12:41:51+00:00">Show</del> <a href="http://www.youtube.com/watch?v=hVimVzgtD6w">TED talk</a> that makes such great use of bubble charts that I wanted to try the Google Vis API for that chart type alone. There&#8217;s more, however, if you don&#8217;t care much for floating blubbles: neat chart variants include the geochart, area charts and the usual classics (bar, pie, etc). Check out <a href="http://code.google.com/apis/chart/interactive/docs/gallery.html">the chart gallery</a> for an overview.</p>
<p>So here are my internet growth charts:</p>
<p>(1) <a href="http://files.ynada.com/charts/globalinetusage-motion.html"><strong>Motion chart showing the growth of the global internet population since 2000 for 208 countries</strong></a></p>
<p>(2) <a href="http://files.ynada.com/charts/globalinetusage-geo.html"><strong>World map showing global internet user statistics for 2009 for 208 countries</strong></a></p>
<p>Data source: <a href="http://data.un.org">data.un.org</a> (ITU database). I&#8217;ve merged two tables from the database into one (absolute numbers and percentages) and cleaned the data up a bit. The resulting tab-separated CSV file is available <a href="http://files.ynada.com/charts/globalinetusage.csv">here</a>.</p>
<p>And here&#8217;s the R code for rendering the chart. Basically you just replace gvisMotionChart() with gvisGeoChart() for the second chart, the rest is the same.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=226&amp;download=gvis_charts.R">gvis_charts.R</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p2268"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p226code8"><pre class="r" style="font-family:monospace;">library(&quot;googleVis&quot;)
n &lt;- read.csv(&quot;netstats.csv&quot;, sep=&quot;\t&quot;)
nmotion &lt;- gvisMotionChart(n, idvar=&quot;Country&quot;, timevar=&quot;Year&quot;, options=list(width=1024, height=768))
plot(nmotion)</pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/226/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Python code for automatically archiving a number of people&#8217;s tweets</title>
		<link>http://blog.ynada.com/143</link>
		<comments>http://blog.ynada.com/143#comments</comments>
		<pubDate>Sat, 11 Jun 2011 10:10:52 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[social data]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=143</guid>
		<description><![CDATA[I meant to post this a month or so ago, when I was conducting my study of casual tweeting, but didn&#8217;t get to it. No harm in posting it now, I guess &#8212; code doesn&#8217;t go bad, fortunately. Note: this requires Linux/Unix/OSX, Python 2.6 and the tweepy library. It might also work on Windows, but [...]]]></description>
			<content:encoded><![CDATA[<p>I meant to post this a month or so ago, when I was conducting my <a href="http://blog.ynada.com/160">study of casual tweeting</a>, but didn&#8217;t get to it. No harm in posting it now, I guess &#8212; code doesn&#8217;t go bad, fortunately.</p>
<p><em>Note: this requires Linux/Unix/OSX, Python 2.6 and the <a href="http://code.google.com/p/tweepy/">tweepy library</a>. It might also work on Windows, but I haven&#8217;t checked.</em></p>
<p><strong>1. Fetching a single user&#8217;s tweets with twitter_fetch.py</strong></p>
<p>The purpose of the script below is to automatically retrieve all new tweets by one or more users, where &#8220;new&#8221; means all tweets that have been added since the last round of archiving. If the script is called for the first time for a given user, it will try to retrieve all available tweets for that person. It relies on the <a href="http://code.google.com/p/tweepy/">tweepy</a> package for Python, which is one of a number of <a href="http://dev.twitter.com/pages/libraries">libraries providing access to the Twitter API</a>. In case you&#8217;re looking for a library for <a href="http://www.r-project.org/">R</a>, check out <a href="http://cran.r-project.org/web/packages/twitteR/index.html">twitteR</a>.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=143&amp;download=twitter_fetch.py">twitter_fetch.py</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p14311"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
</pre></td><td class="code" id="p143code11"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">time</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
<span style="color: #ff7700;font-weight:bold;">import</span> tweepy
&nbsp;
<span style="color: #808080; font-style: italic;"># make sure that the directory 'Tweets' exists, this is</span>
<span style="color: #808080; font-style: italic;"># where the tweets will be archived</span>
wdir = <span style="color: #483d8b;">'Tweets'</span>
<span style="color: #dc143c;">user</span> = <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
id_file = <span style="color: #dc143c;">user</span> + <span style="color: #483d8b;">'.last_id'</span>
timeline_file = <span style="color: #dc143c;">user</span> + <span style="color: #483d8b;">'.timeline'</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">exists</span><span style="color: black;">&#40;</span>wdir + <span style="color: #483d8b;">'/'</span> + id_file<span style="color: black;">&#41;</span>:
	f = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>wdir + <span style="color: #483d8b;">'/'</span> + id_file, <span style="color: #483d8b;">'r'</span><span style="color: black;">&#41;</span>
	since = <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>f.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
	f.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
	tweets = tweepy.<span style="color: black;">api</span>.<span style="color: black;">user_timeline</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span>, since_id=since<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">else</span>:
	tweets = tweepy.<span style="color: black;">api</span>.<span style="color: black;">user_timeline</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>tweets<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">0</span>:
	last_id = <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>tweets<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: #008000;">id</span><span style="color: black;">&#41;</span>
	tweets.<span style="color: black;">reverse</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
	<span style="color: #808080; font-style: italic;"># write tweets to file</span>
	f = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>wdir + <span style="color: #483d8b;">'/'</span> + timeline_file, <span style="color: #483d8b;">'a+'</span><span style="color: black;">&#41;</span>
	<span style="color: #ff7700;font-weight:bold;">for</span> tweet <span style="color: #ff7700;font-weight:bold;">in</span> tweets:
		output = <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>tweet.<span style="color: black;">created_at</span><span style="color: black;">&#41;</span> + <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\t</span>'</span> + tweet.<span style="color: black;">text</span>.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\r</span>'</span>, <span style="color: #483d8b;">' '</span><span style="color: black;">&#41;</span>.<span style="color: black;">encode</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'utf-8'</span><span style="color: black;">&#41;</span> + <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\t</span>'</span> + tweet.<span style="color: black;">source</span>.<span style="color: black;">encode</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'utf-8'</span><span style="color: black;">&#41;</span> + <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\n</span>'</span>
		f.<span style="color: black;">write</span><span style="color: black;">&#40;</span>output<span style="color: black;">&#41;</span>
		<span style="color: #ff7700;font-weight:bold;">print</span> output
	f.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
	<span style="color: #808080; font-style: italic;"># write last id to file</span>
	f = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>wdir + <span style="color: #483d8b;">'/'</span> + id_file, <span style="color: #483d8b;">'w'</span><span style="color: black;">&#41;</span>
	f.<span style="color: black;">write</span><span style="color: black;">&#40;</span>last_id<span style="color: black;">&#41;</span>
	f.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">else</span>:
	<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'No new tweets for '</span> + <span style="color: #dc143c;">user</span></pre></td></tr></table></div>

<p>The code is pretty straight-forward. I wrote it without really knowing Python beyond the bare essentials and relying heavily on <a href="http://ipython.scipy.org/moin/">IPython</a>&#8216;s code completion. Actual retrieval of tweets happens in a single line:</p>
<pre>
tweets = tweepy.api.user_timeline(user)
</pre>
<pre>&nbsp;</pre>
<p>The rest of the script is devoted to managing the data and making sure only new tweets are retrieved. This is done via the <em>since_id</em> parameter which is fed the last recorded id that has been saved to the user&#8217;s id file in the previous round of archiving. There are more elegant ways of doing this, but any improvements are up to you. <img src='http://blog.ynada.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p><strong>2. Fetching a bunch of different users&#8217; tweets with twitter_fetch_all.sh</strong></p>
<p>Second comes a very simple bash script. The only thing it does is call twitter_fetch.py once for each user in a list of people you want to track. Again, there are probably other ways of doing this, but I wanted to keep the functions of the two different scripts separate.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=143&amp;download=twitter_fetch_all.sh">twitter_fetch_all.sh</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p14312"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code" id="p143code12"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/bin/bash</span>
<span style="color: #666666; font-style: italic;"># This is will perform twitter_fetch,py on the twitter_users[] array. Add any number of twitter_users[NUMBER]=&quot;USER&quot; lines below</span>
<span style="color: #666666; font-style: italic;"># to archive additional accounts.</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># --- twitter user list ---</span>
twitter_users<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">0</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>=<span style="color: #ff0000;">&quot;SomeUser&quot;</span>
twitter_users<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">1</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>=<span style="color: #ff0000;">&quot;SomeOtherUser&quot;</span>
twitter_users<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>=<span style="color: #ff0000;">&quot;YetAnotherUser&quot;</span>
twitter_users<span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#93;</span>=<span style="color: #ff0000;">&quot;YouGetTheIdea&quot;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># --- execute twitter_fetch.py ---</span>
<span style="color: #000000; font-weight: bold;">for</span> twitter_user <span style="color: #000000; font-weight: bold;">in</span> <span style="color: #800000;">${twitter_users[*]}</span>
<span style="color: #000000; font-weight: bold;">do</span>
	<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;Getting tweets for user <span style="color: #007800;">$twitter_user</span>&quot;</span>
	python twitter_fetch.py <span style="color: #007800;">$twitter_user</span>
	<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;Done.&quot;</span>
	<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;&quot;</span>
<span style="color: #000000; font-weight: bold;">done</span></pre></td></tr></table></div>

<p>You should place this in the same directory as twitter_fetch.py and modify it to suit your needs.</p>
<p><strong>3. Automating the whole thing with a cronjob</strong></p>
<p>Finally, here&#8217;s a <a href="http://en.wikipedia.org/wiki/Cron">cron</a> directive I used to automate the process and log the result in case any errors occur. Read the linked Wikipedia article if you&#8217;re unfamiliar with cron, it&#8217;s a very convenient way of automating tasks on Linux/Unix.</p>
<pre>
0 * * * * sh /root/twitter_fetch_all.sh >/root/twitter_fetch.log
</pre>
<pre>&nbsp;</pre>
<p>(Yes, I&#8217;m running this as root. Because I can. And because it&#8217;s an <a href="http://aws.amazon.com">EC2</a> instance with nothing else on it anyway.)</p>
<p>Hope it&#8217;s useful to someone, let me know if you have any questions. <img src='http://blog.ynada.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/143/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extracting comments from a Blogger.com blog post with R</title>
		<link>http://blog.ynada.com/336</link>
		<comments>http://blog.ynada.com/336#comments</comments>
		<pubDate>Sun, 20 Feb 2011 18:57:30 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[atom]]></category>
		<category><![CDATA[blogger.com]]></category>
		<category><![CDATA[comments]]></category>
		<category><![CDATA[rstats]]></category>
		<category><![CDATA[scraping]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=336</guid>
		<description><![CDATA[Note #1: Check out this very useful post by Najko Jahn describing how to extract links to blogs via Google Blog Search. Note #2: I&#8217;ll update the code below once I find the time using Najko&#8217;s cleaner XPath-based solution. Recently I&#8217;ve been working with comments as part of the project on science blogging we&#8217;re doing [...]]]></description>
			<content:encoded><![CDATA[<p>Note #1: Check out <a href="http://libreas.wordpress.com/2011/01/31/credit-to-whom-credit-is-due-bloganalysen-mit-google-und-r/">this very useful post</a> by <a href="http://twitter.com/najkojahn">Najko Jahn</a> describing how to extract links to blogs via <a href="http://blogsearch.google.com">Google Blog Search</a>.</p>
<p>Note #2: I&#8217;ll update the code below once I find the time using Najko&#8217;s cleaner XPath-based solution.</p>
<p>Recently I&#8217;ve been working with comments as part of the project on science blogging we&#8217;re doing at the<a href="http://nfgwin.uni-duesseldorf.de"> Junior Researchers Group &#8220;Science and the Internet&#8221;</a>. I wrote the script below to quickly extract comments from <a href="http://en.wikipedia.org/wiki/Atom_(standard)">Atom</a> feeds, such as those generated by <a href="http://blogger.com">Blogger.com</a>.</p>
<p>The code isn&#8217;t exactly pretty, mostly because I didn&#8217;t use an XML parser to properly read the data, instead resorting to brute-force pattern matching, but it gets the job done. Two easier (and cleaner) routes would have been to a) get the data directly from the <a href="http://code.google.com/intl/en/apis/gdata/">Google Data API</a> (doesn&#8217;t work as far as I can tell, since there seems to be no implementation for R*) or b) parse the data specifically as Atom (doesn&#8217;t work as &#8212; annoyingly &#8212; there is no specific parsing support for Atom in R). Properly parsing the XML, while not rocket science, seemed more complex than necessary to me, especially given the fact that Atom should be common enough.</p>
<p>Scraping, by the way, makes for a very nice exercise for a pragmatic programming class (the one you might teach in the Digital Humanities or Information Science), since you teach people how to get their hands on data they can then use as part of their own projects.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=336&amp;download=extract-comments-atom.R">extract-comments-atom.R</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p33614"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
</pre></td><td class="code" id="p336code14"><pre class="r" style="font-family:monospace;">rm(list=ls(all=T));
library(&quot;RCurl&quot;);
&nbsp;
rounds &lt;- 3;
perpage &lt;- 100;
feedurl &lt;- &quot;http://rrresearch.blogspot.com/feeds/2171542729230739732/comments/default&quot;;
&nbsp;
for (i in 1:rounds) {
	thisurl &lt;- paste(feedurl, &quot;?start-index=&quot;, ((i - 1) * perpage + 1), &quot;&amp;max-results=&quot;, perpage, sep=&quot;&quot;);
	if (exists(&quot;feeddata&quot;)==T) feeddata &lt;- c(feeddata, getURL(thisurl)) else feeddata &lt;- getURL(thisurl);
}
&nbsp;
buffer &lt;- paste(feeddata, collapse=&quot; &quot;);
&nbsp;
entries &lt;- unlist(strsplit(buffer, &quot;&lt;entry&gt;&quot;));
entries &lt;- gsub(&quot;&lt;/feed&gt;.*?$&quot;, &quot;&quot;, entries);
entries &lt;- entries[-1];
&nbsp;
# get rid of quotes, excess whitespace etc
entries &lt;- gsub(&quot;\n&quot;, &quot;&quot;, entries, perl=T);
entries &lt;- gsub(&quot;&amp;amp;#39;&quot;, &quot;\'&quot;, entries, perl=T);
entries &lt;- gsub(&quot;&amp;amp;quot;&quot;, &quot;\&quot;&quot;, entries, perl=T);
entries &lt;- gsub(&quot;(&amp;lt;br /&amp;gt;)+&quot;, &quot; &quot;, entries, perl=T);
entries &lt;- gsub(&quot;&amp;lt;&quot;, &quot;&lt;&quot;, entries, perl=T);
entries &lt;- gsub(&quot;&amp;gt;&quot;, &quot;&gt;&quot;, entries, perl=T);
&nbsp;
# extract date, author and text of comments
dates &lt;- gsub(&quot;^&lt;id&gt;.*?&lt;published&gt;([0-9T:\\.-]{29,})&lt;/published&gt;.*?&lt;/entry&gt;(&lt;/feed&gt;)?$&quot;, &quot;\\1&quot;, entries, perl=T);
dates &lt;- paste(substr(dates, 1, 10), substr(dates, 12, 19));
dates.px &lt;- as.POSIXct(dates, origin=&quot;1970-01-01&quot;, tz=&quot;GMT-1&quot;);
dates.f &lt;- strftime(dates.px, &quot;%d %b %H:%M&quot;);
users &lt;- gsub(&quot;^&lt;id&gt;.*?&lt;name&gt;(.*?)&lt;/name&gt;.*?&lt;/entry&gt;(&lt;/feed&gt;)?$&quot;, &quot;\\1&quot;, entries, perl=T);
comments &lt;- gsub(&quot;^&lt;id&gt;.*?&lt;content type='html'&gt;(.*?)&lt;/content&gt;.*?&lt;/entry&gt;(&lt;/feed&gt;)?$&quot;, &quot;\\1&quot;, entries, perl=T);
posters &lt;- sort(table(users), decreasing=T);
&nbsp;
d &lt;- data.frame(date=dates.f, user=users, comment=comments);
&nbsp;
# write two tables, one containing all the comments and the other a simple frequency list
write.csv(d, file=&quot;blog-comments.csv&quot;);
write.csv(posters, &quot;blog-posters.csv&quot;);</pre></td></tr></table></div>

<p>* I spoke a bit too soon there. There <i>is</i> <a href="http://r-forge.r-project.org/projects/rgoogledata/">an implementation for Google Data with R</a>, but it doesn&#8217;t support Blogger.com and many other interesting services. Hopefully such an implementation will be provided eventually. That, or I just quit whining and learn Python&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/336/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Dynamic Twitter graphs with R and Gephi (clip and code)</title>
		<link>http://blog.ynada.com/425</link>
		<comments>http://blog.ynada.com/425#comments</comments>
		<pubDate>Sun, 02 Jan 2011 16:46:19 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[clip]]></category>
		<category><![CDATA[dynamicgraph]]></category>
		<category><![CDATA[gephi]]></category>
		<category><![CDATA[gexf]]></category>
		<category><![CDATA[mla09]]></category>
		<category><![CDATA[rstats]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=425</guid>
		<description><![CDATA[Note 1/4/11: I&#8217;ve updated the code below after discovering a few bugs. Back in October when Jean Burgess first posted a teaser of the Gephi dynamic graph feature applied to Twitter data, I thought right away that this was going to bring Twitter visualization to an entirely new level. When you play around with graph [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Note 1/4/11: I&#8217;ve updated the code below after discovering a few bugs.</strong></p>
<p>Back in October when <a href="http://www.cci.edu.au/profile/jean-burgess">Jean Burgess</a> first <a href="http://www.mappingonlinepublics.net/2010/10/06/fun-with-gephis-new-dynamic-visualization-feature/">posted a teaser</a> of the <a href="http://gephi.org/">Gephi</a> <a href="http://gephi.org/gephi-dynamic-features/">dynamic graph</a> feature applied to Twitter data, I thought right away that this was going to bring Twitter visualization to an entirely new level. When you play around with graph visualizations for a while you inevitably come to the conclusion that they are of very limited use for studying something like Twitter because of it&#8217;s dynamicity as a ongoing communicative process. Knowing <em>that</em> someone retweeted someone else a lot or that a certain word occured many times is only half the story. <em>When</em> someone got a lot of retweets or some word was used frequently is often much more interesting.</p>
<p>Anyhow, <a href="http://snurb.info/">Axel Bruns</a> posted a <a href="http://www.mappingonlinepublics.net/2010/10/20/dynamic-networks-in-gephi-from-twapperkeeper-to-gexf/">first bit of code</a> (for generating GEXF files) back in October, followed by a detailed implementation (<a href="http://www.mappingonlinepublics.net/2010/12/30/visualising-twitter-dynamics-in-gephi-part-1/">1</a>, <a href="http://www.mappingonlinepublics.net/2010/12/30/visualising-twitter-dynamics-in-gephi-part-2/">2</a>) a few days ago. Since Axel uses <a href="http://www.gnu.org/software/gawk/">Gawk</a> and I prefer <a href="http://r-project.org/">R</a>, the first thing I did was to write an R port of Axel&#8217;s code. It does the following:</p>
<ol>
<li>Extract all tweets containing @-messages and retweets from a <a href="http://twapperkeeper.com/index.php">Twapperkeeper</a> hashtag archive.</li>
<li>Generate a table containing the fields <em>sender</em>, <em>recipient</em>, <em>start time</em> and <em>end time</em> for each data point.</li>
<li>Write this table to a <a href="http://gexf.net/format/">GEXF</a> file.</li>
</ol>
<p>The implementation as such wasn&#8217;t difficult and I didn&#8217;t really follow Axel&#8217;s code too closely, since R is syntactically different from Gawk. The thing I needed to figure out was the logic of the GEXF file, specifically start and end times, in order to make sure that edges decay over time. Axel explains this in detail in <a href="http://www.mappingonlinepublics.net/2010/12/30/visualising-twitter-dynamics-in-gephi-part-1/">his post</a> and provides a very thorough and clean implementation.</p>
<p>My own implementation is rougher and probably still needs polishing is several places, but here&#8217;s a first result (no sound; watch in HD and fullscreen):</p>
<p><object width="480" height="385"><param name="movie" value="http://www.youtube.com/v/R8s5Qh9WDqU?fs=1&amp;hl=de_DE"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/R8s5Qh9WDqU?fs=1&amp;hl=de_DE" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<p><strong>Note 1/7/11: I&#8217;ve replaced the clip above with a better one after ironing out a few issues with my script. The older clip is still available <a href="http://www.youtube.com/watch?v=-SsnFrynKPE">here</a>.</strong></p>
<p><del datetime="2011-01-07T17:24:59+00:00">Like previous visualizations I&#8217;ve done, this also uses the <a href="http://summarizr.labs.eduserv.org.uk/?hashtag=mla09">#MLA09 data</a>, i.e. tweets from the 2009 convention of the <a href="http://www.mla.org/">Modern Language Association</a>.</del><br />
<strong>1/7/11: The newer clip is based on data from <a href="http://dh2010.cch.kcl.ac.uk/">Digital Humanities 2010</a> (#dh2010).</strong></p>
<p>And here&#8217;s the R code for generating the GEXF file, in case you want to play around with it:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=425&amp;download=dynamictwittergraph.R">dynamictwittergraph.R</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p42516"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
</pre></td><td class="code" id="p425code16"><pre class="r" style="font-family:monospace;">rm(list=ls(all=T));
&nbsp;
outfile.gexf &lt;- &quot;dh2010.gexf&quot;;
decaytime = 3600;
buffer = 0;
eid = 1;
&nbsp;
tweets &lt;- read.csv(file.choose(), head=T, sep=&quot;|&quot;, quote=&quot;&quot;, fileEncoding=&quot;UTF-8&quot;);
ats &lt;- tweets[grep(&quot;@([a-z0-9_]{1,15}):?&quot;, tweets$text),];
g.from &lt;- tolower(as.character(ats$from_user))
g.to &lt;- tolower(gsub(&quot;^.*@([a-z0-9_]{1,15}):?.*$&quot;, &quot;\\1&quot;, ats$text, perl=T));
g.start &lt;- ats$time - min(ats$time) + buffer;
g.end &lt;- ats$time - min(ats$time) + decaytime + buffer;
g &lt;- data.frame(from=g.from[], to=g.to[], start=g.start[], end=g.end[]);
g &lt;- g[order(g$from, g$to, g$start),];
output &lt;- paste(&quot;&lt;?xml version=\&quot;1.0\&quot; encoding=\&quot;UTF-8\&quot;?&gt;\n&lt;gexf xmlns=\&quot;http://www.gexf.net/1.2draft\&quot; version=\&quot;1.2\&quot;&gt;\n&lt;graph mode=\&quot;dynamic\&quot; defaultedgetype=\&quot;directed\&quot; start=\&quot;0\&quot; end=\&quot;&quot;, max(g$end) + decaytime, &quot;\&quot;&gt;\n&lt;edges&gt;\n&quot;, sep =&quot;&quot;);
all.from &lt;- as.character(unique(g$from));
for (i in 1:length(all.from))
{
	this.from &lt;- all.from[i];
	this.to &lt;- as.character(unique(g$to[grep(this.from, g$from)]));
	for (j in 1:length(this.to))
	{
		all.starts &lt;- g$start[intersect(grep(this.from, g$from), grep(this.to[j], g$to))];
		all.ends &lt;- g$end[intersect(grep(this.from, g$from), grep(this.to[j], g$to))];
		output &lt;- paste(output, &quot;&lt;edge id=\&quot;&quot;, eid, &quot;\&quot; source=\&quot;&quot;, this.from, &quot;\&quot; target=\&quot;&quot;, this.to[j], &quot;\&quot; start=\&quot;&quot;, min(all.starts), &quot;\&quot; end=\&quot;&quot;, max(all.ends), &quot;\&quot;&gt;\n&lt;attvalues&gt;\n&quot;, sep=&quot;&quot;);
		for (k in 1:length(all.starts))
		{	
			# overlap
			# if (all.starts[k+1] &lt; all.ends[k]) output &lt;- paste(output, &quot;&quot;, sep=&quot;&quot;); ... ?
			output &lt;- paste(output, &quot;\t&lt;attvalue for=\&quot;0\&quot; value=\&quot;1\&quot; start=\&quot;&quot;, all.starts[k], &quot;\&quot; /&gt;\n&quot;, sep=&quot;&quot;);
		}
		output &lt;- paste(output, &quot;&lt;/attvalues&gt;\n&lt;slices&gt;\n&quot;, sep=&quot;&quot;);		
		for (l in 1:length(all.starts))
		{
			output &lt;- paste(output, &quot;\t&lt;slice start=\&quot;&quot;, all.starts[l], &quot;\&quot; end=\&quot;&quot;, all.ends[l], &quot;\&quot; /&gt;\n&quot;, sep=&quot;&quot;);
		}
		output &lt;- paste(output, &quot;&lt;/slices&gt;\n&lt;/edge&gt;\n&quot;, sep=&quot;&quot;);	
		eid = eid + 1;
	}
} 
output &lt;- paste(output, &quot;&lt;/edges&gt;\n&lt;/graph&gt;\n&lt;/gexf&gt;\n&quot;, sep = &quot;&quot;);
cat(output, file=outfile.gexf);</pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/425/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Generating graphs of retweets and @-messages on Twitter using R and Gephi</title>
		<link>http://blog.ynada.com/339</link>
		<comments>http://blog.ynada.com/339#comments</comments>
		<pubDate>Sun, 17 Oct 2010 23:15:46 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[gephi]]></category>
		<category><![CDATA[igraph]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rstats]]></category>
		<category><![CDATA[sna]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=339</guid>
		<description><![CDATA[After recently discovering the excellent methods section on mappingonlinepublics.net, I decided it was time to document my own approach to Twitter data. I&#8217;ve been messing around with R and igraph for a while, but it wasn&#8217;t until I discovered Gephi that things really moved forward. R/igraph are great for preprocessing the data (not sure how [...]]]></description>
			<content:encoded><![CDATA[<p>After recently discovering the <a href="http://www.mappingonlinepublics.net/methods/">excellent methods section</a> on <a href="http://www.mappingonlinepublics.net/">mappingonlinepublics.net</a>, I decided it was time to document my own approach to Twitter data. I&#8217;ve been messing around with <a href="http://www.r-project.org/">R</a> and <a href="igraph.sourceforge.net">igraph</a> for a while, but it wasn&#8217;t until I discovered <a href="http://www.gephi.org">Gephi</a> that things really moved forward. R/igraph are great for preprocessing the data (not sure how they compare with Awk), but rather cumbersome to work with when it comes to visualization. Last week, I posted a first Gephi visualization of <a href="http://files.ynada.com/seadragon/fcrc-rts/index.html">retweeting at the Free Culture Research Conference</a> and since then I&#8217;ve experimented some more (see <a href="http://www.flickr.com/photos/coffee001/sets/72157625181430136/">here</a> and <a href="http://files.ynada.com/graphs/s21-ats2.pdf">here</a>). #FCRC was a test case for a larger study that examines how academics use Twitter at conferences, which is part of what we&#8217;re doing at the junior researchers group <a href="http://nfgwin.uni-duesseldorf.de">Science and the Internet</a> at the University of Düsseldorf (sorry, website is currently in German only).</p>
<p><a href="http://blog.ynada.com/wp-content/uploads/2010/10/mla-small.png"><img src="http://blog.ynada.com/wp-content/uploads/2010/10/mla-small-300x243.png" alt="" title="mla-small" width="300" height="243" class="alignleft size-medium wp-image-346" /></a></p>
<p>Here&#8217;s a step-by-step description of how those graphs were created.</p>
<p><strong>Step #1: Get tweets from Twapperkeeper</strong><br />
Like <a href="http://snurb.info/">Axel</a>, I use <a href="http://www.twapperkeeper.com/">Twapperkeeper</a> to retrieve tweets tagged with the hashtag I&#8217;m investigating. This has several advantages:</p>
<ul>
<li>it&#8217;s possible to retrieve older tweets which you won&#8217;t get via the <a href="http://apiwiki.twitter.com/">API</a></li>
<li>tweets are stored as CSV rather than XML which makes them easier to work with for our purposes.</li>
</ul>
<p>The sole disadvatage of Twapperkeeper is that we have to rely on the integrity of their archive &#8212; if for some reason not all tweets with our hastag have been retrieved, we won&#8217;t know. Also, certain information is not retained in Twapperkeepers&#8217; CSV files that is present in Twitter&#8217;s XML (e.g. geolocation) that we might be interested in.</p>
<p>Instructions:</p>
<ol>
<li>Search for the hashtag you&#8217;re interested in (e.g. <a href="http://twapperkeeper.com/hashtag/fcrc">#FCRC</a>). If no archive exists, create one.</li>
<li>Go to the archive&#8217;s Twapperkeeper page, sign into Twitter (button at the top) and then choose export and download at the bottom of the page</li>
<li>Choose the pipe character (&#8220;|&#8221;) as seperator. I use that one rather than the more conventional comma or semicolon because we are dealing with text data which is bound to contain these characters a lot. Of course the pipe can also be parsed incorrectly, so be sure to have a look at the graph file you make.</li>
<li>Voila. You should now have a CSV file containing tweets on your hard drive. <strong>Edit:</strong>Actually, you have a .tar file that contains the tweets. Look inside the .tar for a file with a very long name ending with &#8220;-1&#8243; (not &#8220;info&#8221;) &#8212; that&#8217;s the data we&#8217;re looking for.</li>
</ol>
<p><strong>Step #2: Turn CSV data into a graph file with R and igraph</strong><br />
R is an open source statistics package that is primarily used via the command line. It&#8217;s absolutely fantastic at slicing and dicing data, although the syntax is a bit quirky and the documentation is somewhat geared towards experts (=statisticians). igraph is an R package for constructing and visualizing graphs. It&#8217;s great for a variety of purposes, but due to the command line approach of R, actually drawing graphs with igraph was somewhat difficult for me. But, as outlined below, Gephi took care of that. Running the code below in R will transform the CSV data into a GraphML file which can then be visualized with Gephi. While R and igraph rock at translating the data into another format, Gephi is the better tool for the actual visualization.</p>
<p>Instructions:</p>
<ol>
<li>Download and install <a href="http://www.r-project.org">R</a>.</li>
<li>In the R console, run the following: <code>install.packages(igraph);</code></li>
<li>Copy the CSV you&#8217;ve just downloaded from Twapperkeeper to an empty directory and rename it to <strong>tweets.csv</strong>.</li>
<li>Finally, save the R file below to the same folder as the CSV and run it.</li>
</ol>
<p>Code for extracting RTs and @s from a Twapperkeeper CSV file and saving the result in the GraphML format:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://blog.ynada.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=339&amp;download=tweetgraph.R">tweetgraph.R</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p33918"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
</pre></td><td class="code" id="p339code18"><pre class="r" style="font-family:monospace;"># Extract @-message and RT graphs from conference tweets
library(igraph);
&nbsp;
# Read Twapperkeeper CSV file
tweets &lt;- read.csv(&quot;tweets.csv&quot;, head=T, sep=&quot;|&quot;, quote=&quot;&quot;, fileEncoding=&quot;UTF-8&quot;);
print(paste(&quot;Read &quot;, length(tweets$text), &quot; tweets.&quot;, sep=&quot;&quot;));
&nbsp;
# Get @-messages, senders, receivers
ats &lt;- grep(&quot;^\\.?@[a-z0-9_]{1,15}&quot;, tolower(tweets$text), perl=T, value=T);
at.sender &lt;- tolower(as.character(tweets$from_user[grep(&quot;^\\.?@[a-z0-9_]{1,15}&quot;, tolower(tweets$text), perl=T)]));
at.receiver &lt;- gsub(&quot;^\\.?@([a-z0-9_]{1,15})[^a-z0-9_]+.*$&quot;, &quot;\\1&quot;, ats, perl=T);
print(paste(length(ats), &quot; @-messages from &quot;, length(unique(at.sender)), &quot; senders and &quot;, length(unique(at.receiver)), &quot; receivers.&quot;, sep=&quot;&quot;));
&nbsp;
# Get RTs, senders, receivers
rts &lt;- grep(&quot;^rt @[a-z0-9_]{1,15}&quot;, tolower(tweets$text), perl=T, value=T);
rt.sender &lt;- tolower(as.character(tweets$from_user[grep(&quot;^rt @[a-z0-9_]{1,15}&quot;, tolower(tweets$text), perl=T)]));
rt.receiver &lt;- gsub(&quot;^rt @([a-z0-9_]{1,15})[^a-z0-9_]+.*$&quot;, &quot;\\1&quot;, rts, perl=T);
print(paste(length(rts), &quot; RTs from &quot;, length(unique(rt.sender)), &quot; senders and &quot;, length(unique(rt.receiver)), &quot; receivers.&quot;, sep=&quot;&quot;));
&nbsp;
# This is necessary to avoid problems with empty entries, usually caused by encoding issues in the source files
at.sender[at.sender==&quot;&quot;] &lt;- &quot;&lt;NA&gt;&quot;;
at.receiver[at.receiver==&quot;&quot;] &lt;- &quot;&lt;NA&gt;&quot;;
rt.sender[rt.sender==&quot;&quot;] &lt;- &quot;&lt;NA&gt;&quot;;
rt.receiver[rt.receiver==&quot;&quot;] &lt;- &quot;&lt;NA&gt;&quot;;
&nbsp;
# Create a data frame from the sender-receiver information
ats.df &lt;- data.frame(at.sender, at.receiver);
rts.df &lt;- data.frame(rt.sender, rt.receiver);
&nbsp;
# Transform data frame into a graph
ats.g &lt;- graph.data.frame(ats.df, directed=T);
rts.g &lt;- graph.data.frame(rts.df, directed=T);
&nbsp;
# Write sender -&gt; receiver information to a GraphML file
print(&quot;Write sender -&gt; receiver table to GraphML file...&quot;);
write.graph(ats.g, file=&quot;ats.graphml&quot;, format=&quot;graphml&quot;);
write.graph(rts.g, file=&quot;rts.graphml&quot;, format=&quot;graphml&quot;);</pre></td></tr></table></div>

<p><strong>Step #3: Visualize graph with Gephi</strong><br />
Once you&#8217;ve completed steps 1 and 2, simply open your GraphML file(s) with Gephi. You should see a visualization of the graph. I won&#8217;t give an in-depth description of how Gephi works, but the <a href="http://gephi.org/users/">users section</a> of gephi.org has great tutorials which explain both Gephi and graph visualization in general really well.</p>
<p>I&#8217;ll post more on the topic as I make further progress, for example with stuff like dynamic graphs which show change in the network over time.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/339/feed</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Graphing Twitter friends/followers with R (updated)</title>
		<link>http://blog.ynada.com/279</link>
		<comments>http://blog.ynada.com/279#comments</comments>
		<pubDate>Thu, 24 Jun 2010 22:37:23 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=279</guid>
		<description><![CDATA[Edit: And here is an update of the update, this one contributed by Kai Heinrich. Here&#8217;s an updated version of my script from last month, something I&#8217;ve been meaning to do for a while. I thank Anatol Stefanowitsch and Gábor Csárdi for improving my quite sloppy code. # Load twitteR and igraph packages. library(twitteR) library(igraph) [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Edit:</strong> And here is <a href="http://blog.ynada.com/864">an update of the update</a>, this one contributed by <a href="http://tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_wirtschaftswissenschaften/wi/wiid/professur/wiid_heinrich">Kai Heinrich</a>.</p>
<p>Here&#8217;s an updated version of <a href="http://blog.ynada.com/247">my script from last month</a>, something I&#8217;ve been meaning to do for a while. I thank <a href="http://www-user.uni-bremen.de/~anatol/">Anatol Stefanowitsch</a> and <a href="http://cneuro.rmki.kfki.hu/people/csardi">Gábor Csárdi</a> for improving my quite sloppy code.</p>
<p><code><br />
# Load twitteR and igraph packages.<br />
library(twitteR)<br />
library(igraph)<br />
</code><br />
<code><br />
# Start a Twitter session.<br />
sess <- initSession('USERNAME', 'PASSWORD')<br />
</code><br />
<code><br />
# Retrieve a maximum of 20 friends/followers for yourself or someone else Note that<br />
# at the moment, the limit parameter does not [yet] seem to be working.<br />
friends.object <- userFriends('USERNAME', n=20, sess)<br />
followers.object <- userFollowers('USERNAME', n=20, sess)<br />
</code><br />
<code><br />
# Retrieve the names of your friends and followers from the friend<br />
# and follower objects.<br />
friends <- sapply(friends.object,name)<br />
followers <- sapply(followers.object,name)<br />
</code><br />
<code><br />
# Create a data frame that relates friends and followers to you for expression in the graph<br />
relations <- merge(data.frame(User='YOUR_NAME', Follower=friends), data.frame(User=followers, Follower='YOUR_NAME'), all=T)<br />
</code><br />
<code><br />
# Create graph from relations.<br />
g <- graph.data.frame(relations, directed = T)<br />
</code><br />
<code><br />
# Assign labels to the graph (=people's names)<br />
V(g)$label <- V(g)$name<br />
</code><br />
<code><br />
# Plot the graph using plot() or tkplot().<br />
tkplot(g)<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/279/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Code and brief instruction for graphing Twitter with R</title>
		<link>http://blog.ynada.com/247</link>
		<comments>http://blog.ynada.com/247#comments</comments>
		<pubDate>Sun, 23 May 2010 22:54:37 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[THATcamp]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=247</guid>
		<description><![CDATA[Edit: I&#8217;ve posted an updated version of the script here. It is not quite as compressed as Anatol&#8217;s version, but I think it&#8217;s a decent compromise between readability and efficiency. Edit #2 And yet another update, this one contributed by Kai Heinrich. I hacked together some code for R last night to visualize a Twitter [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Edit:</strong> I&#8217;ve posted an updated version of the script <a href="http://blog.ynada.com/279">here</a>. It is not quite as compressed as Anatol&#8217;s version, but I think it&#8217;s a decent compromise between readability and efficiency. <img src='http://blog.ynada.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><strong>Edit #2</strong> And <a href="http://blog.ynada.com/864">yet another update</a>, this one contributed by <a href="http://tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_wirtschaftswissenschaften/wi/wiid/professur/wiid_heinrich">Kai Heinrich</a>.</p>
<p>I hacked together some code for <a href="http://www.r-project.org/">R</a> last night to visualize a Twitter graph (=who you are following and who is following you) that I briefly showed at <a href="http://thatcamp.org/2010/visualizing-text/">the session on visualizing text</a> today at <a href="http://thatcamp.org/">THATCamp</a> and that I wanted to share. My comments in the code are very basic and there is much to improve, but in the spirit of &#8220;release early, release often&#8221;, I think it&#8217;s better to get it out there right away.</p>
<p>Ingredients:</p>
<ul>
<li><a href="http://www.r-project.org/">R</a></li>
<li><a href="http://cran.r-project.org/web/packages/twitteR/index.html">twitteR package</a></li>
<li><a href="http://igraph.sourceforge.net/">igraph package</a></li>
</ul>
<p>Note that packages are most easily installed with the <code>install.packages()</code> function inside of R, so R is really the only thing you need to download initially.</p>
<p><strong>Code:</strong></p>
<p><code># Load twitteR package<br />
library(twitteR)</code></p>
<p><code># Load igraph package<br />
library(igraph)</code><br />
<code><br />
# Set up friends and followers as vectors. This, along with some stuff below, is not really necessary, but the result of my relative inability to deal with the twitter user object in an elegant way. I'm hopeful that I will figure out a way of shortening this in the future</code></p>
<p><code>friends <- as.character()<br />
followers <- as.character()</code></p>
<p><code># Start an Twitter session. Note that the user through whom the session is started doesn't have to be the one that your search for in the next step. I'm using myself (coffee001) in the code below, but you could authenticate with your username and then search for somebody else.</code></p>
<p><code>sess <- initSession('coffee001', 'mypassword')</code><br />
<code><br />
# Retrieve a maximum of 500 friends for user 'coffee001'.</code></p>
<p><code>friends.object <- userFriends('coffee001', n=500, sess)</code></p>
<p><code># Retrieve a maximum of 500 followers for 'coffee001'. Note that retrieving many/all of your followers will create a very busy graph, so if you are experimenting it's better to start with a small number of people (I used 25 for the graph below).</code></p>
<p><code>followers.object <- userFollowers('coffee001', n=500, sess)</code></p>
<p><code># This code is necessary at the moment, but only because I don't know how to slice just the "name" field for friends and followers from the list of user objects that twitteR retrieves. I am 100% sure there is an alternative to looping over the objects, I just haven't found it yet. Let me know if you do...</code></p>
<p><code>for (i in 1:length(friends.object))<br />
{<br />
	friends <- c(friends, friends.object[[i]]@name);<br />
}</code><br />
<code><br />
for (i in 1:length(followers.object))<br />
{<br />
	followers <- c(followers, followers.object[[i]]@name);<br />
}</code></p>
<p><code><br />
# Create data frames that relate friends and followers to the user you search for and merge them.</code></p>
<p><code>relations.1 <- data.frame(User='Cornelius', Follower=friends)<br />
relations.2 <- data.frame(User=followers, Follower='Cornelius')<br />
relations <- merge(relations.1, relations.2, all=T)</code></p>
<p><code># Create graph from relations.</code></p>
<p><code>g <- graph.data.frame(relations, directed = T)</code></p>
<p><code># Assign labels to the graph (=people's names)</code></p>
<p><code>V(g)$label <- V(g)$name</code></p>
<p><code># Plot the graph.</code></p>
<p><code>plot(g)</code></p>
<p>For the screenshot below I've used the <code>tkplot()</code> method instead of <code>plot()</code>, which allows you to move around and highlight elements interactively with the mouse after plotting them. The graph only shows 20 people in order to keep the complexity manageable. </p>
<p><a href="http://blog.ynada.com/wp-content/uploads/2010/05/twitter.png"><img src="http://blog.ynada.com/wp-content/uploads/2010/05/twitter-300x175.png" alt="" title="twitter" width="300" height="175" class="alignleft size-medium wp-image-251" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/247/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Corpus Linguistics with R, Day 2</title>
		<link>http://blog.ynada.com/126</link>
		<comments>http://blog.ynada.com/126#comments</comments>
		<pubDate>Tue, 28 Jul 2009 12:29:44 +0000</pubDate>
		<dc:creator>cornelius</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://blog.ynada.com/?p=126</guid>
		<description><![CDATA[R Lesson 2 text gsub ("second", "third", text) SEARCH-REPLACE-SUBJECT [1] "This is a first example sentence." [2] "And this is a third example sentence." > gsub ("n", "X", text) [1] "This is a first example seXteXce." [2] "AXd this is a secoXd example seXteXce." > gsub ("is", "was", text) [1] "Thwas was a first example [...]]]></description>
			<content:encoded><![CDATA[<p>R Lesson 2</p>
<p></code><br />
text<-c("This is a first example sentence.", "And this is a second example sentence.")</p>
<p># gsub replaces stuff in strings</p>
<p>> gsub ("second", "third", text)<br />
SEARCH-REPLACE-SUBJECT<br />
[1] "This is a first example sentence."<br />
[2] "And this is a third example sentence."<br />
> gsub ("n", "X", text)<br />
[1] "This is a first example seXteXce."<br />
[2] "AXd this is a secoXd example seXteXce."<br />
> gsub ("is", "was", text)<br />
[1] "Thwas was a first example sentence."<br />
[2] "And thwas was a second example sentence."</p>
<p>---</p>
<p>Perl-style regex</p>
<p>^	beginning of str, e.g. "^x", ***OR*** NOT inside of []<br />
$	end of str, e.g. "x$"<br />
.	any other char<br />
\	escape char - TWO ("\\") needed<br />
[]	character classes, e.g. [aeiou] vowels, [a-h] is same as [abcdefgh]<br />
{MIN,MAX} number of immediately preceding unit (chacter)</p>
<p>examples<br />
lo+l </p>
<p>> grep("analy[sz]e", c("analyze", "analyse", "moo"), perl=T, value=T)<br />
[1] "analyze" "analyse"</p>
<p>> grep("(first|second)", text, perl=T, value=T)<br />
[1] "This is a first example sentence."<br />
[2] "And this is a second example sentence."<br />
> grep("(first|lalala)", text, perl=T, value=T)<br />
[1] "This is a first example sentence."<br />
> </p>
<p>> grep("ab{2}", z, perl=T, value=T)<br />
[1] "aabbccdd"<br />
> grep("(ab){2}", z, perl=T, value=T)<br />
[1] "ababcdcd"<br />
><br />
><br />
> gsub("a (first|second)", "another", text, perl=T)<br />
[1] "This is another example sentence."<br />
[2] "And this is another example sentence."<br />
><br />
><br />
><br />
><br />
> gsub("[abcdefgh]", "X", text, perl=T)<br />
[1] "TXis is X Xirst XxXmplX sXntXnXX."<br />
[2] "AnX tXis is X sXXonX XxXmplX sXntXnXX."</p>
<p>> grep("forg[eo]t(s|ting|ten)?_v", a.corpus.file, perl=T, value=T)<br />
all forms of forget</p>
<p>*? lazy matching e.g.<br />
gregexpr("s.*?s", text[1], perl=T)</p>
<p>> gregexpr("s.*?s", text[1], perl=T)<br />
[[1]]<br />
[1]  4 14<br />
attr(,"match.length")<br />
[1]  4 12</p>
<p># note: things that are matched are consumed and can then not be found again in the same passtext</p>
<p>> gsub("(19|20)[0-9]{2}", "YEAR", text)<br />
[1] "They killed 250 people in YEAR." "No, it was in YEAR."<br />
> #replaces only 19xx and 20xx</p>
<p>---</p>
<p>> textfile<-scan(file.choose(), what="char", sep="\n")<br />
Enter file name: corp_gpl_short.txt<br />
Read 9 items<br />
> textfile<-tolower(textfile)<br />
> textfile<br />
[1] "the licenses for most software are designed to take away your"<br />
[2] "freedom to share and change it. by contrast, the gnu general public"<br />
[3] "license is intended to guarantee your freedom to share and change free"<br />
[4] "software--to make sure the software is free for all its users. this"<br />
[5] "general public license applies to most of the free software"<br />
[6] "foundation's software and to any other program whose authors commit to"<br />
[7] "using it. (some other free software foundation software is covered by"<br />
[8] "the gnu library general public license instead.) you can apply it to"<br />
[9] "your programs, too."<br />
> unlist(strsplit(textfile, "//W"))<br />
[1] "the licenses for most software are designed to take away your"<br />
[2] "freedom to share and change it. by contrast, the gnu general public"<br />
[3] "license is intended to guarantee your freedom to share and change free"<br />
[4] "software--to make sure the software is free for all its users. this"<br />
[5] "general public license applies to most of the free software"<br />
[6] "foundation's software and to any other program whose authors commit to"<br />
[7] "using it. (some other free software foundation software is covered by"<br />
[8] "the gnu library general public license instead.) you can apply it to"<br />
[9] "your programs, too."<br />
> text_split<-unlist(strsplit(textfile, "//W"))<br />
> text_split<br />
[1] "the licenses for most software are designed to take away your"<br />
[2] "freedom to share and change it. by contrast, the gnu general public"<br />
[3] "license is intended to guarantee your freedom to share and change free"<br />
[4] "software--to make sure the software is free for all its users. this"<br />
[5] "general public license applies to most of the free software"<br />
[6] "foundation's software and to any other program whose authors commit to"<br />
[7] "using it. (some other free software foundation software is covered by"<br />
[8] "the gnu library general public license instead.) you can apply it to"<br />
[9] "your programs, too."<br />
><br />
> text_split<-unlist(strsplit(textfile, "//W"))<br />
> text_split<br />
[1] "the licenses for most software are designed to take away your"<br />
[2] "freedom to share and change it. by contrast, the gnu general public"<br />
[3] "license is intended to guarantee your freedom to share and change free"<br />
[4] "software--to make sure the software is free for all its users. this"<br />
[5] "general public license applies to most of the free software"<br />
[6] "foundation's software and to any other program whose authors commit to"<br />
[7] "using it. (some other free software foundation software is covered by"<br />
[8] "the gnu library general public license instead.) you can apply it to"<br />
[9] "your programs, too."<br />
> text_split<-unlist(strsplit(textfile, "\\W"))</p>
<p>> textfile<-scan(file.choose(), what="char", sep="\n")<br />
Enter file name: corp_gpl_short.txt<br />
Read 9 items<br />
> textfile<-tolower(textfile)<br />
> textfile<br />
[1] "the licenses for most software are designed to take away your"<br />
[2] "freedom to share and change it. by contrast, the gnu general public"<br />
[3] "license is intended to guarantee your freedom to share and change free"<br />
[4] "software--to make sure the software is free for all its users. this"<br />
[5] "general public license applies to most of the free software"<br />
[6] "foundation's software and to any other program whose authors commit to"<br />
[7] "using it. (some other free software foundation software is covered by"<br />
[8] "the gnu library general public license instead.) you can apply it to"<br />
[9] "your programs, too."<br />
> unlist(strsplit(textfile, "//W"))<br />
[1] "the licenses for most software are designed to take away your"<br />
[2] "freedom to share and change it. by contrast, the gnu general public"<br />
[3] "license is intended to guarantee your freedom to share and change free"<br />
[4] "software--to make sure the software is free for all its users. this"<br />
[5] "general public license applies to most of the free software"<br />
[6] "foundation's software and to any other program whose authors commit to"<br />
[7] "using it. (some other free software foundation software is covered by"<br />
[8] "the gnu library general public license instead.) you can apply it to"<br />
[9] "your programs, too."                                                   </p>
<p>> text_split<-unlist(strsplit(textfile, "//W+"))<br />
> text_split<br />
[1] "the licenses for most software are designed to take away your"<br />
[2] "freedom to share and change it. by contrast, the gnu general public"<br />
[3] "license is intended to guarantee your freedom to share and change free"<br />
[4] "software--to make sure the software is free for all its users. this"<br />
[5] "general public license applies to most of the free software"<br />
[6] "foundation's software and to any other program whose authors commit to"<br />
[7] "using it. (some other free software foundation software is covered by"<br />
[8] "the gnu library general public license instead.) you can apply it to"<br />
[9] "your programs, too."<br />
> sort(table(text_split), decreasing=T)<br />
text_split<br />
                   to   software        the       free        and    general<br />
         9          9          7          5          4          3          3<br />
        is         it    license     public       your         by     change<br />
         3          3          3          3          3          2          2<br />
       for foundation    freedom        gnu       most      other      share<br />
         2          2          2          2          2          2          2<br />
       all        any    applies      apply        are    authors       away<br />
         1          1          1          1          1          1          1<br />
       can     commit   contrast    covered   designed  guarantee    instead<br />
         1          1          1          1          1          1          1<br />
  intended        its    library   licenses       make         of    program<br />
         1          1          1          1          1          1          1<br />
  programs          s       some       sure       take       this        too<br />
         1          1          1          1          1          1          1<br />
     users      using      whose        you<br />
         1          1          1          1<br />
> </p>
<p>> text_freqs<br />
text_split<br />
        to   software        the       free        and    general         is<br />
         9          7          5          4          3          3          3<br />
        it    license     public       your         by     change        for<br />
         3          3          3          3          2          2          2<br />
foundation    freedom        gnu       most      other      share        all<br />
         2          2          2          2          2          2          1<br />
       any    applies      apply        are    authors       away        can<br />
         1          1          1          1          1          1          1<br />
    commit   contrast    covered   designed  guarantee    instead   intended<br />
         1          1          1          1          1          1          1<br />
       its    library   licenses       make         of    program   programs<br />
         1          1          1          1          1          1          1<br />
         s       some       sure       take       this        too      users<br />
         1          1          1          1          1          1          1<br />
     using      whose        you<br />
         1          1          1<br />
> text_freqs[text_freqs>1]<br />
text_split<br />
        to   software        the       free        and    general         is<br />
         9          7          5          4          3          3          3<br />
        it    license     public       your         by     change        for<br />
         3          3          3          3          2          2          2<br />
foundation    freedom        gnu       most      other      share<br />
         2          2          2          2          2          2<br />
> </p>
<p>> !(text_split %in% stop_list)<br />
 [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE<br />
[13]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE<br />
[25]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE<br />
[37]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE<br />
[49]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE<br />
[61]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE<br />
[73]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE<br />
[85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE<br />
> text_stopremoved<-text_split[!(text_split %in% stop_list)]<br />
> text_stopremoved<br />
 [1] "licenses"   "for"        "most"       "software"   "are"<br />
 [6] "designed"   "to"         "take"       "away"       "your"<br />
[11] "freedom"    "to"         "share"      "change"     "it"<br />
[16] "by"         "contrast"   "gnu"        "general"    "public"<br />
[21] "license"    "is"         "intended"   "to"         "guarantee"<br />
[26] "your"       "freedom"    "to"         "share"      "change"<br />
[31] "free"       "software"   "to"         "make"       "sure"<br />
[36] "software"   "is"         "free"       "for"        "all"<br />
[41] "its"        "users"      "this"       "general"    "public"<br />
[46] "license"    "applies"    "to"         "most"       "free"<br />
[51] "software"   "foundation" "s"          "software"   "to"<br />
[56] "any"        "other"      "program"    "whose"      "authors"<br />
[61] "commit"     "to"         "using"      "it"         "some"<br />
[66] "other"      "free"       "software"   "foundation" "software"<br />
[71] "is"         "covered"    "by"         "gnu"        "library"<br />
[76] "general"    "public"     "license"    "instead"    "you"<br />
[81] "can"        "apply"      "it"         "to"         "your"<br />
[86] "programs"   "too"<br />
> </p>
<p># LOAD an R file<br />
source("something.r")</p>
<p></code></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.ynada.com/126/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

