Monday, November 03, 2008

Haskell for counting votes!

There's apparently an important election going on in the States. There are also a few interesting articles, for example in the french maths magazine Tangente (no up to date online edition that I could find unfortunately) and in the New Scientist, on how the way you tally votes affect the outcome. Tangente has a stiking example, and I'll go through the 5 vote couting methods they outline with an Haskell implementation of each.
No pesky elephant vs donkey, here, the election is about your favorite female movie star:

data Star = MarylinMonroe | AngelinaJolie | BrigitteBardot | EmmanuelleBeart | ClaudiaCardinale
deriving (Show,Read,Eq,Bounded,Enum,Ord,Ix)


Note the "deriving Ix" (from Data.Array.IArray) so we can use the stars as array indices.

We have six elector profiles: each profile has ranked the stars in order of preferences, and we have the numbers (in millions, say) of electors for that profile. For example, 7.2 million people prefer MarylinMonroe, then Emanuelle Beart, then Claudia Cardinale, 4.8 millions prefer Angelina Jolie, etc. Here's the full data:

votes :: [(Double,[Star])]
votes = [
(7.2,[MarylinMonroe,EmmanuelleBeart,ClaudiaCardinale,BrigitteBardot,AngelinaJolie]),
(4.8,[AngelinaJolie,ClaudiaCardinale,EmmanuelleBeart,BrigitteBardot,MarylinMonroe]),
(4,[BrigitteBardot,AngelinaJolie,ClaudiaCardinale,EmmanuelleBeart,MarylinMonroe]),
(3.6,[EmmanuelleBeart,BrigitteBardot,ClaudiaCardinale,AngelinaJolie,MarylinMonroe]),
(1.6,[ClaudiaCardinale,AngelinaJolie,EmmanuelleBeart,BrigitteBardot,MarylinMonroe]),
(0.8,[ClaudiaCardinale,BrigitteBardot,EmmanuelleBeart,AngelinaJolie,MarylinMonroe])
]


With this we need a few helper method before we're ready to implement our voting algorithms.

We need a way to tally votes for stars, taking into account that several profiles may vote for the same star at the same round (Claudia Cardinale at the first round, say). We use an array with the addition as accumulation function, sorted my popularity
accumVotes ::  [(Star,Double)] ->  [(Star,Double)]
accumVotes v=let
arr=(accumArray (+) 0 (MarylinMonroe,ClaudiaCardinale) v):: Array Star Double
in sortBy (\a b -> compare (snd b) (snd a)) (assocs arr)


For some reason I needed to tell GHC about the array type, it couldn't infer it.
We need a way to get the votes for a given round (even if we'll only be using the first round this way)
roundResults :: Int -> [(Star,Double)]
roundResults r=let
v=map (\(a,b)->(b !! r,a)) votes
in accumVotes v


And a simple extraction function to get the winner's name from accumulated votes:
bestVote ::[(Star,Double)] -> Star
bestVote v= fst $ head $ (accumVotes v)


Now we're ready. The first algorith is a simple one round voting election:

oneRound :: Star
oneRound=bestVote (roundResults 0)


This makes Marylin Monroe the clear winner. But a lot of countries use a two round system: the most popular candidates after the first round make it through to the second round:

We first need a helper function that retrieves the first element from a first list found in a second list (is there a standard method somewhere?)
findFirst :: Eq a => [a] -> [a] -> a 
findFirst toFind (a:rest) = if elem a toFind
then a
else findFirst toFind rest


Then it's straightforward: we get the first two candidates after the first round, and accumulate their results over all the profiles
twoRounds :: Star
twoRounds= let
(fr1:fr2:_)=(roundResults 0)
allScores=map (\(a,b)->(findFirst [fst fr1,fst fr2] b,a)) votes
in bestVote allScores


Angelina Jolie for President! A third method we can use is to proceed by elimination: at each round, we drop the least popular candidate, until we get only one:

elimination :: Star
elimination= fst $ elimination' (roundResults 0)
where
elimination' :: [(Star,Double)] -> (Star,Double)
elimination' (a:b:[])=a
elimination' l= let
firstRound= map fst $ init l
in elimination' $ filter (\a->(snd a) >0.0 ) $ accumVotes $ map (\(a,b)->(findFirst firstRound b,a)) votes


This little recursivity yields Brigitte Bardot. Borda suggested another method: we assign a weight to each candidate depending on its ranking, that weight is used as a multiplier of the votes. First choice gets a total tally of the votes times 5, second choice 4, etc.

borda :: Star
borda = bestVote $ concat $ map (\(a,ss)-> map (\(s,mul)->(s,mul*a)) (zip ss [5,4 .. 1])) votes


And this time, Emanuelle BĂ©art comes on top! Last method we will survey is from Condorcet: we look at each match bewteen two stars in isolation. A match is won if more votes put the first star in front of the second star. We first calculate all the possible matches than consider each won match as 1 vote, and accumulate as before:

condorcet :: Star
condorcet= let
matches=[[x,y] | x <-allStars, y <- tail [x .. ClaudiaCardinale]]
matchResults= map (\match->(bestVote $ map (\(a,b)->(findFirst match b,a)) votes,1)) matches
in bestVote matchResults


And, behold, the winner through this method is... Claudia Cardinale! Five methods, five winners!
Of course these are really quick draft of voting methods, they should take into accound draws, etc... Do not use this code to elect the president of a real country!

1 comment:

marco said...

To all data analyst, data managers, software engineers and data miners.


Take the online survey for Molecular Algorithm: A Better Approach in Data Integration