Jun 25

Here’s an updated version of my script from last month, something I’ve been meaning to do for a while. I thank Anatol Stefanowitsch and Gábor Csárdi for improving my quite sloppy code.


# Load twitteR and igraph packages.
library(twitteR)
library(igraph)


# Start a Twitter session.
sess <- initSession('USERNAME', 'PASSWORD')


# Retrieve a maximum of 20 friends/followers for yourself or someone else Note that
# at the moment, the limit parameter does not [yet] seem to be working.
friends.object <- userFriends('USERNAME', n=20, sess)
followers.object <- userFollowers('USERNAME', n=20, sess)


# Retrieve the names of your friends and followers from the friend
# and follower objects.
friends <- sapply(friends.object,name)
followers <- sapply(followers.object,name)


# Create a data frame that relates friends and followers to you for expression in the graph
relations <- merge(data.frame(User='YOUR_NAME', Follower=friends), data.frame(User=followers, Follower='YOUR_NAME'), all=T)


# Create graph from relations.
g <- graph.data.frame(relations, directed = T)


# Assign labels to the graph (=people's names)
V(g)$label <- V(g)$name


# Plot the graph using plot() or tkplot().
tkplot(g)

Tagged with:
May 24

Edit: I’ve posted an updated version of the script here. It is not quite as compressed as Anatol’s version, but I think it’s a decent compromise between readability and efficiency. :-)

I hacked together some code for R last night to visualize a Twitter graph (=who you are following and who is following you) that I briefly showed at the session on visualizing text today at THATCamp and that I wanted to share. My comments in the code are very basic and there is much to improve, but in the spirit of “release early, release often”, I think it’s better to get it out there right away.

Ingredients:

Note that packages are most easily installed with the install.packages() function inside of R, so R is really the only thing you need to download initially.

Code:

# Load twitteR package
library(twitteR)

# Load igraph package
library(igraph)


# Set up friends and followers as vectors. This, along with some stuff below, is not really necessary, but the result of my relative inability to deal with the twitter user object in an elegant way. I'm hopeful that I will figure out a way of shortening this in the future

friends <- as.character()
followers <- as.character()

# Start an Twitter session. Note that the user through whom the session is started doesn't have to be the one that your search for in the next step. I'm using myself (coffee001) in the code below, but you could authenticate with your username and then search for somebody else.

sess <- initSession('coffee001', 'mypassword')

# Retrieve a maximum of 500 friends for user 'coffee001'.

friends.object <- userFriends('coffee001', n=500, sess)

# Retrieve a maximum of 500 followers for 'coffee001'. Note that retrieving many/all of your followers will create a very busy graph, so if you are experimenting it's better to start with a small number of people (I used 25 for the graph below).

followers.object <- userFollowers('coffee001', n=500, sess)

# This code is necessary at the moment, but only because I don't know how to slice just the "name" field for friends and followers from the list of user objects that twitteR retrieves. I am 100% sure there is an alternative to looping over the objects, I just haven't found it yet. Let me know if you do...

for (i in 1:length(friends.object))
{
friends <- c(friends, friends.object[[i]]@name);
}


for (i in 1:length(followers.object))
{
followers <- c(followers, followers.object[[i]]@name);
}


# Create data frames that relate friends and followers to the user you search for and merge them.

relations.1 <- data.frame(User='Cornelius', Follower=friends)
relations.2 <- data.frame(User=followers, Follower='Cornelius')
relations <- merge(relations.1, relations.2, all=T)

# Create graph from relations.

g <- graph.data.frame(relations, directed = T)

# Assign labels to the graph (=people's names)

V(g)$label <- V(g)$name

# Plot the graph.

plot(g)

For the screenshot below I've used the tkplot() method instead of plot(), which allows you to move around and highlight elements interactively with the mouse after plotting them. The graph only shows 20 people in order to keep the complexity manageable.

Tagged with:
Jul 28

R Lesson 2


text<-c("This is a first example sentence.", "And this is a second example sentence.")

# gsub replaces stuff in strings

> gsub ("second", "third", text)
SEARCH-REPLACE-SUBJECT
[1] "This is a first example sentence."
[2] "And this is a third example sentence."
> gsub ("n", "X", text)
[1] "This is a first example seXteXce."
[2] "AXd this is a secoXd example seXteXce."
> gsub ("is", "was", text)
[1] "Thwas was a first example sentence."
[2] "And thwas was a second example sentence."

---

Perl-style regex

^ beginning of str, e.g. "^x", ***OR*** NOT inside of []
$ end of str, e.g. "x$"
. any other char
\ escape char - TWO ("\\") needed
[] character classes, e.g. [aeiou] vowels, [a-h] is same as [abcdefgh]
{MIN,MAX} number of immediately preceding unit (chacter)

examples
lo+l

> grep("analy[sz]e", c("analyze", "analyse", "moo"), perl=T, value=T)
[1] "analyze" "analyse"

> grep("(first|second)", text, perl=T, value=T)
[1] "This is a first example sentence."
[2] "And this is a second example sentence."
> grep("(first|lalala)", text, perl=T, value=T)
[1] "This is a first example sentence."
>

> grep("ab{2}", z, perl=T, value=T)
[1] "aabbccdd"
> grep("(ab){2}", z, perl=T, value=T)
[1] "ababcdcd"
>
>
> gsub("a (first|second)", "another", text, perl=T)
[1] "This is another example sentence."
[2] "And this is another example sentence."
>
>
>
>
> gsub("[abcdefgh]", "X", text, perl=T)
[1] "TXis is X Xirst XxXmplX sXntXnXX."
[2] "AnX tXis is X sXXonX XxXmplX sXntXnXX."

> grep("forg[eo]t(s|ting|ten)?_v", a.corpus.file, perl=T, value=T)
all forms of forget

*? lazy matching e.g.
gregexpr("s.*?s", text[1], perl=T)

> gregexpr("s.*?s", text[1], perl=T)
[[1]]
[1] 4 14
attr(,"match.length")
[1] 4 12

# note: things that are matched are consumed and can then not be found again in the same passtext

> gsub("(19|20)[0-9]{2}", "YEAR", text)
[1] "They killed 250 people in YEAR." "No, it was in YEAR."
> #replaces only 19xx and 20xx

---

> textfile<-scan(file.choose(), what="char", sep="\n")
Enter file name: corp_gpl_short.txt
Read 9 items
> textfile<-tolower(textfile)
> textfile
[1] "the licenses for most software are designed to take away your"
[2] "freedom to share and change it. by contrast, the gnu general public"
[3] "license is intended to guarantee your freedom to share and change free"
[4] "software--to make sure the software is free for all its users. this"
[5] "general public license applies to most of the free software"
[6] "foundation's software and to any other program whose authors commit to"
[7] "using it. (some other free software foundation software is covered by"
[8] "the gnu library general public license instead.) you can apply it to"
[9] "your programs, too."
> unlist(strsplit(textfile, "//W"))
[1] "the licenses for most software are designed to take away your"
[2] "freedom to share and change it. by contrast, the gnu general public"
[3] "license is intended to guarantee your freedom to share and change free"
[4] "software--to make sure the software is free for all its users. this"
[5] "general public license applies to most of the free software"
[6] "foundation's software and to any other program whose authors commit to"
[7] "using it. (some other free software foundation software is covered by"
[8] "the gnu library general public license instead.) you can apply it to"
[9] "your programs, too."
> text_split<-unlist(strsplit(textfile, "//W"))
> text_split
[1] "the licenses for most software are designed to take away your"
[2] "freedom to share and change it. by contrast, the gnu general public"
[3] "license is intended to guarantee your freedom to share and change free"
[4] "software--to make sure the software is free for all its users. this"
[5] "general public license applies to most of the free software"
[6] "foundation's software and to any other program whose authors commit to"
[7] "using it. (some other free software foundation software is covered by"
[8] "the gnu library general public license instead.) you can apply it to"
[9] "your programs, too."
>
> text_split<-unlist(strsplit(textfile, "//W"))
> text_split
[1] "the licenses for most software are designed to take away your"
[2] "freedom to share and change it. by contrast, the gnu general public"
[3] "license is intended to guarantee your freedom to share and change free"
[4] "software--to make sure the software is free for all its users. this"
[5] "general public license applies to most of the free software"
[6] "foundation's software and to any other program whose authors commit to"
[7] "using it. (some other free software foundation software is covered by"
[8] "the gnu library general public license instead.) you can apply it to"
[9] "your programs, too."
> text_split<-unlist(strsplit(textfile, "\\W"))

> textfile<-scan(file.choose(), what="char", sep="\n")
Enter file name: corp_gpl_short.txt
Read 9 items
> textfile<-tolower(textfile)
> textfile
[1] "the licenses for most software are designed to take away your"
[2] "freedom to share and change it. by contrast, the gnu general public"
[3] "license is intended to guarantee your freedom to share and change free"
[4] "software--to make sure the software is free for all its users. this"
[5] "general public license applies to most of the free software"
[6] "foundation's software and to any other program whose authors commit to"
[7] "using it. (some other free software foundation software is covered by"
[8] "the gnu library general public license instead.) you can apply it to"
[9] "your programs, too."
> unlist(strsplit(textfile, "//W"))
[1] "the licenses for most software are designed to take away your"
[2] "freedom to share and change it. by contrast, the gnu general public"
[3] "license is intended to guarantee your freedom to share and change free"
[4] "software--to make sure the software is free for all its users. this"
[5] "general public license applies to most of the free software"
[6] "foundation's software and to any other program whose authors commit to"
[7] "using it. (some other free software foundation software is covered by"
[8] "the gnu library general public license instead.) you can apply it to"
[9] "your programs, too."

> text_split<-unlist(strsplit(textfile, "//W+"))
> text_split
[1] "the licenses for most software are designed to take away your"
[2] "freedom to share and change it. by contrast, the gnu general public"
[3] "license is intended to guarantee your freedom to share and change free"
[4] "software--to make sure the software is free for all its users. this"
[5] "general public license applies to most of the free software"
[6] "foundation's software and to any other program whose authors commit to"
[7] "using it. (some other free software foundation software is covered by"
[8] "the gnu library general public license instead.) you can apply it to"
[9] "your programs, too."
> sort(table(text_split), decreasing=T)
text_split
to software the free and general
9 9 7 5 4 3 3
is it license public your by change
3 3 3 3 3 2 2
for foundation freedom gnu most other share
2 2 2 2 2 2 2
all any applies apply are authors away
1 1 1 1 1 1 1
can commit contrast covered designed guarantee instead
1 1 1 1 1 1 1
intended its library licenses make of program
1 1 1 1 1 1 1
programs s some sure take this too
1 1 1 1 1 1 1
users using whose you
1 1 1 1
>

> text_freqs
text_split
to software the free and general is
9 7 5 4 3 3 3
it license public your by change for
3 3 3 3 2 2 2
foundation freedom gnu most other share all
2 2 2 2 2 2 1
any applies apply are authors away can
1 1 1 1 1 1 1
commit contrast covered designed guarantee instead intended
1 1 1 1 1 1 1
its library licenses make of program programs
1 1 1 1 1 1 1
s some sure take this too users
1 1 1 1 1 1 1
using whose you
1 1 1
> text_freqs[text_freqs>1]
text_split
to software the free and general is
9 7 5 4 3 3 3
it license public your by change for
3 3 3 3 2 2 2
foundation freedom gnu most other share
2 2 2 2 2 2
>

> !(text_split %in% stop_list)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[13] TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
[25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
[37] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[49] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
[61] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[73] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
[85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> text_stopremoved<-text_split[!(text_split %in% stop_list)]
> text_stopremoved
[1] "licenses" "for" "most" "software" "are"
[6] "designed" "to" "take" "away" "your"
[11] "freedom" "to" "share" "change" "it"
[16] "by" "contrast" "gnu" "general" "public"
[21] "license" "is" "intended" "to" "guarantee"
[26] "your" "freedom" "to" "share" "change"
[31] "free" "software" "to" "make" "sure"
[36] "software" "is" "free" "for" "all"
[41] "its" "users" "this" "general" "public"
[46] "license" "applies" "to" "most" "free"
[51] "software" "foundation" "s" "software" "to"
[56] "any" "other" "program" "whose" "authors"
[61] "commit" "to" "using" "it" "some"
[66] "other" "free" "software" "foundation" "software"
[71] "is" "covered" "by" "gnu" "library"
[76] "general" "public" "license" "instead" "you"
[81] "can" "apply" "it" "to" "your"
[86] "programs" "too"
>

# LOAD an R file
source("something.r")

Tagged with:
Jul 28

(This post documents the first day of a class on R that I took at ESU C&T. I is posted here purely for my own use.)


R Lesson 1

> 2+3; 2/3; 2^3
[1] 5
[1] 0.6666667
[1] 8

---

Fundamentals - Functions

> log(x=1000, base=10)
[1] 3

---

(Formals describes the syntax of other functions)

formals(sample)

---

Variables

( <- allows you to save something in a data structure (variable) )
> a<-2+3
> a
[1] 5

# is for comments

whitespace doesn't matter

---
# Pick files
file.choose()

# Get working dir
getwd()

# Set working dir
setwd("..")

# Save
> save(VARIABLE_NAME, file=file.choose())
Fehler in save(test, file = file.choose()) : Objekt ‘test’ nicht gefunden
> save.image("FILE_NAME")

---

> setwd("/home/cornelius/Code/samples/Brown_95perc")
> getwd()
[1] "/home/cornelius/Code/samples/Brown_95perc"
> dir()

> my_array <- c(1,2,3,4)
> my_array
[1] 1 2 3 4
> my_array <- c("lalala", "lululu", "bla")
> my_array2 <- c(1,2,3,4)
> c(my_array, my_array2)
[1] "lalala" "lululu" "bla" "1" "2" "3" "4"
>

# it is possible to add something to ALL values in a vector, i.e.
my_array2 + 10

# c (conc) makes a list
stuff1<-c(1,2,3,4,5)

---

# sequence starts at 1 (first arg), goes on for 5 (second arg), increments by 1 (third arg)
seq(1, 5, 1)

---

# put a file into a corpus vector
# what=real|char sep=seperator
> my_corpus<-scan(file=file.choose(), what="char", sep="\n")

# unique elements in my array
unique(array)

# count elements in an array
table(array)

# sort elements in an array
sort(table(array))

---
# this tells me the position of the elements in my text that aren't "this"
> values<-which(my_little_corpus!="this")
> values
[1] 2 3 4 5 6 7 8 9 11 12 13 14

# this will produce TRUE|FALSE for my condition (is this element "this")
> values<-my_little_corpus!="this"
> values
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
[13] TRUE TRUE

# this will return the array without "this"
> values<-my_little_corpus[my_little_corpus!="this"]
> values
[1] "is" "just" "a" "little" "example" "bla" "bla"
[8] "bla" "is" "the" "third" "line"

...

> cc<-c("banana", "bagel")
> cc == "banana"; cc!="banana" #
[1] TRUE FALSE
[1] FALSE TRUE
> "banana" %in% cc
[1] TRUE
> c("bagel", "banana") %in% cc
[1] TRUE TRUE
> match ("banana", cc)
[1] 1
> match (c("bagel","banana"), cc)
[1] 2 1

# match looks for a list of tokens and returns their position in the datastructure

---
> cat(bb, sep="\n", file=scan(what="char"), append=F)
# write the contents of bb to a file, ask the user for file

moo<-scan(what="char")
# read something the user types into a var

# Clear Mem
> rm(list=ls(all=T))
>

---

# create vector1 (ordered)
vec1<-c("a","b","c","d","e","f,",g","h","i","j")

# oder
# > letters[1:10]
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

# create vector2 (random)
# > vector2<-sample(vector1)

---

length()
# number of elements

nchar()
# number of characters

> aa<-"know"
> nchar(aa)
[1] 4
> aa<-c("I","do","not","know")
> nchar(aa)
[1] 1 2 3 4
> lala<-c("cat","gnu","hippopotamus")
> lala
[1] "cat" "gnu" "hippopotamus"
> nchar(lala)
[1] 3 3 12

> substr("hippopotamus", 0, 5)
[1] "hippo"
>

# like explode() / implode()
paste (string, sep="my_seperator", collapse="stuff to put in")

---

# percentages
x/sum(x)

barplot (1,2,3)

Read in corpus data and build a list of words frequencies
1) scan file
2) strsplit by " "
3) unlist to make vector
4) make a table with freqs
5) sort
6) output

#search for strings
grep("needle", haystack)

> grep("is", text, value=T)
[1] "This is a first example sentence."
[2] "And this is a second example sentence."
> grep("And", text, value=T)
[1] "And this is a second example sentence."
> grep("sentence", text, value=T)
[1] "This is a first example sentence."
[2] "And this is a second example sentence."
>

gregexpr
# alternative to grep, returns a list of vectors

> mat<-gregexpr("e", text)
> mat
[[1]]
[1] 17 23 26 29 32
attr(,"match.length")
[1] 1 1 1 1 1

[[2]]
[1] 16 22 28 31 34 37
attr(,"match.length")
[1] 1 1 1 1 1 1

> unlist(mat)
[1] 17 23 26 29 32 16 22 28 31 34 37
> mat<-gregexpr("sentence", text)
> sapply (mat, c)
[1] 25 30

Tagged with:
preload preload preload