Data Motility: The Materiality of Big Social Data

Citing this paper Please note that where the full-text provided on King's Research Portal is the Author Accepted Manuscript or Post-Print version this may differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.

One invaluable source for reading the code of the 'digital human' is mainstream business media.Ever since enterprise computing spread to desktops in the 1980s, the Californian ideology 2 has imbued neoliberalism with dreams of evermore profitable information technology, filling pages from the Wall Street Journal to Wired.Now, with the rise of mobility, explosion of data, and proliferation of platforms and apps, such appraisals continues to be breathlessly dispensed.Sometimes, however, there is a critical revelation in the assessments of profitability.Consider this trenchant maxim for understanding social media and big data recently offered by Tim Worstall, a Fellow from the Adam Smith Institute: 'It's an old adage that if something is free it must be you that is the thing being sold.' 3 .I find this statement richly resonant.First, intentionally or otherwise, Worstall encapsulates a radical critique of the conflation of media production and consumption that stretches from Dallas Smythe's 'audience commodity' to Maurizio Lazzarato's 'immaterial labour'. 4Second, and more to the point here, it stands as an affirmation of Foucault's 1975 methodological imperative to look beyond the 'great texts' to the archives of everyday life when looking for the effective discourse of power as '[the bourgeoisie] said precisely what it was doing, what it was going to do, and why… [i]t stated perfectly what it wanted'. 5

I cite
Foucault here not merely to bolster a blog post from Forbes.com; rather, it signals the theoretical paradigm fundamental to my analysis.The aforementioned quote marks the first time Foucault describes his largely overlooked but vital concept: the dispositif.He developed it in order to move beyond the myriad limitations of a discursive analysis of power toward a more heterogeneous ensemble which includes the non-discursive.Taking the example of Foucault's disciplinary dispositif, it includes i) the discursive regulations of juridical processes, and, ii) the nondiscursive materiality of institutions like prisons and the Panopticon.One might be tempted to say this marks an incipient 'new materialism' that has largely gone unappreciated in the later Foucauldian analysis of power.
In this article I will present such a new materialist interpretation as apposite and will use the dispositif as its conceptual frame.In doing so I present the dispositif as being positioned on the following theoretical continuum.We can start with Deleuze, who considered the dispositif a conceptual friend, 6 and saw it inextricably intertwined with his notion of the assemblage.In turn, the assemblage-agencement in French-is the cohering concept in Actor-Network Theory, which expanded notions of agency to include nonhuman elements, 'prostheses, tools, equipment, technical devices, algorithms, etc'. 7Understanding agency as distributed across human-nonhuman assemblages is a hallmark of new materialism.Such assemblages, as deftly outlined by Dolphijin and van der Tuin, 8 are critical to the development of materialist feminist theory (i.e.Grosz, Braidotti, and Barad, among others) which proffer a non-representational theory of power.The key here is the affordance of a dynamic role of desire, which, for materialist feminist theory, could account for a non-essentialist understanding of sexual differing, as opposed to sexual difference.Dolphijin and van der Tuin cite this specific instance to highlight a more general importance for new materialism, underlining it with a key passage from Deleuze: 'it is not the dispositifs of power that assemble [agenceraient], nor would they be constitutive; it is rather the agencements of desire [desiring-assemblages] that would spread throughout the formations of power following one of their dimensions'. 9Such a conceptual orientation makes visible the diffusion of agency and desire/intentionality across a dispositif.If applied to the mediated environment of the digital human, the dispositif brings into focus the dynamic tension between communicative creativity and its capture, marking out the interplay between sociality and capital therein.
Here I present the dispositif of 'data motility' for such a new materialist analysis of the digital human, the discursive and non-discursive assemblage of the 'you' being sold.Motility denotes how the data you generate increasingly moves autonomously of your control.The assemblage comprising this dispositif, however, must be critically unpacked, lest it remain an analytical 'black box.'This may be desirable for corporate interests dealing in big data, but it does little for an informed understanding of life in the age of Big Social Data (BSD).A sustained and rigorous analysis is beyond the scope of this paper, and I am undertaking such a larger project elsewhere.As noted, my intention in this article is to introduce the dispositif as a conceptual frame for the study of BSD and the digital human.I will, then, identify a few non-discursive, or material elements comprising that assemblage, namely the kind of data which makes up BSD, and the weight and structure of the cloud through which it moves. 10Finally, I will identify for further study how the deeply recursive materiality impacts upon the life, labour and debt of the digital human under BSD.Throughout, I will be sensitive to what I see as the 'desiring-assemblage' of motility, the movement of the BSD we produce.For indeed, if it is our digital selves that are being sold, I am not suggesting we do so simply in the instrumental service of digital capital.Therein lies one of the great benefits of the dispositif as a critical methodology: its assemblage coheres in a dynamic of tension and struggle, without a singular, instrumental driving logic or a sedimented hierarchy.Practically, this means that the 'you' being sold-the social data we all generate-is motile, that is, it flows from us, through our myriad personal technological artifacts and the material intricacies of the cloud initially as an expression of sociality.Yet its movement is not directed by us, and is almost wholly autonomous of our control.Indeed, the data we generate increasingly is moving at the behest of capital and the state.To put a finer analytic point on this, we might make a critical distinction between motile and mobile.Thus we can consider the contained movement of data that primarily augments the profitable growth of the business of BSD and new forms of digital state surveillance as data mobility.Yet there is a glitch inherent in the movement of data as the material environment of the cloud results in the seemingly self-directed movement of data itself.I read this both as a metaphor of the inherent sociality of data, and as a practical example that invariably, all data enclosures leak in all directions.As such, data motility signals a possible route for the progressive becoming of a new data commons.It is my contention that the dispositif of data motility-along with its counterpoint mobility-can help us understand our collective stakes in the kinds of contestation inherent in data motility.

What is a dispositif?
One of the most compelling reasons for using the dispositif to conceptually frame the life of the digital human under BSD is the importance it accords to both materiality, and to thinking in terms of a complex, heterogeneous ensemble.It is important to note that the dispositif marks an overtly politicised shift by Foucault, away from the structuralism and hermeneutics that defined his work through Archaeology of Knowledge.In both his engaged political projects of the early 1970s like the Groupe d'Information sur les Prisons, and in his writings and interviews, Foucault acknowledged the methodological malaise that arose from a solely discursive focus as well as theorising power as domination.In 1975 Foucault acknowledged, 'I was in a dead end.Now, what I would like to do is to try to show that what I call the dispositif is something much more general than the episteme'. 11Foucault's turn to the dispositif began in Discipline and Punish, and became overt in History of Sexuality v.
1, given that the organising concept of the latter was the dispositif de sexualité.It is unsurprising that this was overlooked by most English-language interlocutors because dispositif was inconsistently translated-as apparatus or mechanism or deployment-which obfuscated its conceptual importance.What should be clear, however, is the decisive move beyond the symbolic and representation.When describing his approach in Discipline and Punish, Foucault noted his analysis now included a 'thoroughly heterogeneous ensemble consisting of discourses, institutions, architectural forms, regulatory decisions, laws, administrative measures, scientific statements, philosophical, moral and philanthropic'. 12This is the point where Foucault fully nuances power as symbolic and material, as relational, as microphysical, as circulating in networked formations, and not as simply a repressive force which says 'no'.Hence the importance of the dispositif de sexualité, not a repressive Victorian ideology, but as a discursive-non-discursive matrix through which a normative (and, consequently, 'abnormal') sexuality becomes visible and articulable.This indicates how the dispositif is to be understood as an analysis of power.In this reconceptualisation, Foucault is rejecting power as that which is centrally located, in the mode of production or in the state; nor is it a fungible commodity possessed by individual subjects and wielded like a club.Instead, it is expressed in heterogeneous ensembles, in complex assemblages of the discursive and non-discursive, of power and knowledge, and through which processes of subjectification or individuation unfolds.I want to make two more quick points before identifying the non-discursive elements of the dispositif of data motility.In one of his first references to a concept that would retain sustained interest, Foucault described biopower as a dispositif: 'The biological traits of a population became relevant elements for economic management, and it is thus necessary to organise around them a dispositif which assures not only their subjugation, but the constant increase of their utility'. 13I cite this because the productive dynamic of this dispositif is continued in that of data motility, wherein our quotidian actions have become discrete quanta, visible through their digital traces, and constantly subject to circulation in ways that increase their 'utility.'What is missing from biopower-and Foucauldian dispositifs, in general-is a recognition of the intimate relation between the body and mediating technology.This aporia is addressed by the dynamic presence of data motility.The other point is in the polyvalent nature of power expressed above.Some readers may be thinking that biopower, in fact, was used by Foucault to indicate a rather repressive force, and they would be right.Lazzarato again helps here, noting that we must distinguish biopower from biopolitics.Specifically, biopower is a dispositif of control and domination, whereas biopolitics is a domain of creativity and resistance. 14It is in following this model that I distinguish the contained and constituted flow of data mobility from the deterritorialising and nomadic flow of data motility.The dispositif, then, is not underwriting a utopian analysis, seeking out only lines of optimism in these heterogeneous ensembles.
Rather, it is riven by struggle, and contained within the assemblage of a dispositif is both an analytic and a diagnostic of power, enabling a critique of what we are and identifying what we might become.As Deleuze notes, the analytic of power examines 'what we are (what we are already no longer)' while the diagnostic considers 'what we are in the process of becoming'. 15The dispositif, then, identifies the ways in which we are amidst relations of domination, but not in a manner that leaves us permanently trapped.What is most a propos to the study of BSD here is the role of the archive. 16On the one hand the archive is the sedimented part, like the nineteenth century prison studied by Foucault, the realm of the analytic of power.Yet Foucault equally identifies the archive as that 'which is at the same time close to us, but different from our present; it is the border of the time which surrounds our present'. 17It is a liminal zone between what is sedimented and becoming.The archive-which is certainly one compelling way that digital traces of BSD can be framed-is a key fulcrum point in the strategic value of the dispositif.The motility-the movement-of that data can both reinscribe and reproduce relations and patterns of domination, and provide the material for creative resistance and becoming.In this sense, BSD is a site of struggle, and the manner in which our data circulates therein is of vital importance.

The Materiality of the cloud
Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. 18e dispositif of data motility makes visible and enunciable the movement and machinations of BSD.A focus on its non-discursive, or material qualities brings into focus the nearly-inconceivable volume of BSD, and the velocity both with which it is captured and grows.Size and speed are key factors in its valorisation, and while economic value drives capital to maniacally increase its capture and analysis, it is the pursuit of social and cultural value that drive its generation.By foregrounding these tensions, I will try to make sense of data motility first by examining the kind of data that comprises BSD.One could say this means delineating BSD via the materiality of its discursivity which uneasily coexist in forms both machine readable and human readable.We will then examine the architectural form in which it stored and through which it gains motility.In short, we will introduce both the kinds of data and the databases through which motility transpires.What follows, then, is an introduction to the materiality of the kind of BSD produced, and the structure of its archives.
Even a cursory quantification of the BSD produced by the digital human is challenging, given is rapid growth.Only ten years ago, humanity collectively generated about five exabytes of data per year.For clarification, one exabyte is the equivalent of one million terabytes.In 2012, we generated 2.7 zettabytes (2,700 exabytes), and it is predicted that by 2020 we will reach 40 zettabytes annually.
That is an increase of 8,000 times over in less than two decades. 19I should distinguish what I am calling BSD from the broader category of 'big data'.The latter is more inclusive, entailing sensor data from industrial and domestic networks, RFID tags and 'The Internet of Things', financial markets, and big science projects, among others.While all of this data contributes significantly to the quantification of the world in which we live, the focus here is on the social data generated by the digital human.BSD, then, comes from the mediated communicative practices of our everyday lives, whenever we go online, use our smartphone, use an app or make a purchase.Consider just three of the most popular sites.Google, back in 2008, the most recent available statistics, was processing 20 petabytes per day.In 2012 Facebook users were sharing four billion pieces of content per day, three billion things 'liked' and 300 million photos uploaded.Overall, Facebook's one billion users generate 500 terabytes of social data every single day.Twitter sees nearly 200 million tweets per day.Finally, there are now five billion people calling, texting, tweeting, browsing, posting and generating content on their phones. 20hematically, then, we see the smart phone as a key new vector of mobile communicative sociality, and, that user-generated content primarily transpires on proprietary platforms.What might we glean from the materiality of that data?Before the rise of social media and mobile computational power, much of the information digitally stored was structured data.This is data input into fixed fields, like columns or rows, each of which is clearly defined, as are their relations to one another.Spreadsheets are a quintessential form of this, termed a relational database.The information-or at least each discrete quantum-is simple and uncomplicated, insofar as it is relevant only to its field.Think of demographic information, like your address or date of birth, input into a spreadsheet, as singular forms of structured data.This format means structured data can be accessed and queried by, for example, SQL (Structured Query Language) because of its clearly identifiable and pre-defined schema.Prominent kinds of structured, relational databases include retail transaction data, financial market data, and industry, medical or pharmaceutical research data.Several things should be emphasised about structured data, and its operating environment, called a Relational Database Management System (RDBMS).First, it was the long preferred form of data, especially by corporate IT, because its highly predictable structure allows it to be efficiently processed.This efficiency results from the data being structured not for human but machine readability.Such structured data often would be input by a data entry clerk into a bespoke and costly environment like those provided by Oracle.I make this point to emphasise that structured data is typically instrumental, highly focused, and subject to a pre-defined data model, always intended for efficient processing.While its content may very well represent elements of everyday life, it would not typically be produced, as data, through quotidian communicative sociality.In other words, traditional structured data is more typically composed for a functional purpose, and from inception is structured in a manner that machines like.Structured data, then, is structured vis-à-vis the symbolic realm of computation, of codes, programs and algorithms.
Humans, on the other hand, largely communicate in the symbolic realm of cultural meaning.We do so regardless of the specific non-human or technological elements with which we are assembled-although it must be noted that the historico-medium specificity has profound epistemological and ontological effects.
The materiality of our augmented communicability, manifested in BSD attests to this key historical difference, and it illuminates just how data motility transpires.BSD is not new insofar as it emanates from the kind of communicative sociality that has always been endemic to the human condition.What is new, and why it is of such importance to media theory are the particularities of its technological mediation.The newness of BSD, then, first comes in the form of the quintillions of raw data points being generated every day, which are captured and contained primarily by capital and the state, and proprietarily available for potentially never-ending future analysis.What is also new is that even though it is generated through personal computational devices, it is not in the efficient, machine-readable form of structured data.There is a longstanding rule of thumb that upwards of 80% of the data we generate is, in terms of computer processing, unstructured. 21To clarify, unstructured data is not produced in pre-defined fixed fields, residing in relational databases.At its point of generation, user-generated social data does not conform to a pre-defined schema or data model for processing even though it is generated in the structured space of its platform.Rather, it is generated as informational or affective symbolic content, the result of spontaneous, contingent, free form communicative sociality.BSD is unstructured data because it comprises the traces of the cultural life of the digital human.These are the textual objects that you generate in a blog, social media, a search, a message or an app; they are also the bitmap objects, the images, photos and videos that you send, post, like or tag.Some debate the validity of the term 'unstructured' because if data were truly unstructured it would be unreadable gibberish in any format, by humans and machines.Further, a strong claim can be made that data is always 'structured' when entered into any digital realm.Every website, platform, or application is always comprised of a template created by software and information architects.The insistence on the fundamentally 'structured' nature of data is a shibboleth among proponents of software studies, ranging from Galloway's 'protocological wrappers' to Mika's application of the semantic web to social networking to Gehl's 'real software abstractions'. 22These important contributions, however, can unintentionally obscure key changes in the material makeup of BSD, especially visa-vis its computational infrastructure.This distinction is most clearly exemplified by contrasting the newer Hadoop cluster to the older RDBMS environment.There is great analytical value retaining the working distinction made by most computer scientists between structured, unstructured, and semi-structured data.This is a distinction upon which I will build to better enact a materialist analysis of data technologies, that is, to outline what is new about the big social database as a medium.Such a distinction helps to illustrate the new paradigm of computational power-social, political and economic-that emerges in the big data-crunching environment of Hadoop.
To risk further complicating matters, there is a third category: semistructured data.This typically refers to things like XML (Extensible Markup Language) and its simpler Java-script counterpart JSON, which encodes web documents in a manner both human-and machine-readable.These are basic tags and markers that give some structure to documents and facilitate information exchange.This is extremely important for downstream processing and aggregation, the very interchange of heterogeneous data sources that is integral to data motility.These distinctions then, regardless of their disputed status, help to delineate the important material differences marking BSD.The challenges that these different forms of data create for their efficacious processing are important for my critical analysis as they help circumscribe the very conditions of motility.
For the moment, let's put aside the challenges the average digital human faces in translating and comprehending the interplay of the different forms of data she produces.Instead, let's consider the challenges faced by big data companies and social media giants like Google and Facebook in translating the unstructured data that humans produce into structured data that can be processed at speed and on a vast scale.For BSD makes particular infrastructural demands.One way to understand this paradigmatic shift is to trace a material link in the explosion in BSD back to a desktop-bound curiosity, the University of California-Berkeley's SETI@home (Search for Extra-Terrestrial Intelligence).This distributed computing project was one of the first examples of internet-scale applications, established back in 1997.Within a few years distributed computing took a pronounced cultural turn: Napster emerged, and its peer-to-peer file sharing successors-be it the bit-torrent protocol of Pirate Bay or the file-hosting service of Megaupload-made the widespread exchange of data a prominent new mediated practice.This was further intensified by the emergence of Facebook, YouTube, Twitter, social gaming like Farmville, and e-commerce like eBay, applications and platforms that all scale to global reach and demand.When we add to that the rise of mobile devices and ubiquitous connectivity, the environment for the quotidian generation of BSD, be it structured, unstructured or semi-structured becomes clearer.
The internet-scale applications of social media via mobile devices alone created data footprints that were ill-fitted for traditional RDBMS, not just in terms of volume, but because of the need to integrate different kinds of data from different sources.In short order, there emerged an urgent need for the ability to access and aggregate multiple data sets on a vast scale, necessitating changes in computer architecture and network capacity in a manner reflecting this rise.I should add that Foucault conceived dispositifs as assemblages which cohere in response to an urgent need.He writes, '[the dispositif is a] formation which has as its major function at a given historical moment that of responding to an urgent need.
Thus the dispositif has a dominant strategic function'. 23It is worth recalling again his dispositif of biopower and the urgent need to which it responds: 'the assimilation of a floating population found to be burdensome for an essentially mercantilist economy: there was a strategic imperative acting here as the matrix for a dispositif'. 24But just as with the contradictions and tensions between biopower's dispositif of control and domination, and the creativity and resistance of that of biopolitics, I will suggest that the urgent needs of Google, et.al. differ considerably from that of the digital human.
Google is at the architectural heart of the rise of this data intensive computing environment.As its search engine became the near de facto mode of seeking internet-based content, the operational demands placed on its Page-Rank algorithm intensified.Already by the early 2000s, Google was struggling with its core business: the daily indexing of the entire web necessary for optimising the aforementioned algorithm.In order to cope, it radically reconfigured its approach, shifting to parallel processing distributed across vast networks.A series of papers in 2003 and 2004 by Google engineers helped to rearticulate that company's hardware and software, and in the process, map out the environment in which BSD would flourish.In short, Google established a new paradigm for the processing of big data.They outlined a platform on which could be built the massive indexes from the Internet for real-time analysis by extrapolating from the fundamentals of distributed computing.Think back to the SETI@home project which ingeniously managed a computational task that form a central site would have been prohibitively expensive: analysing the universe for signs of extraterrestrial life.By taking vast observational data from the Arecibo radio telescope, breaking it down into small chunks, and then having it analysed by home desktop computers, it proved the practical value of distributed processing.Similarly, Google needed to process the search requests that were scaling up at a rate similar to that of data in general: from 9,800 requests daily in 1999 to 60 M in 2000 to 200 M in 2004 to 4.7 B in 2011.By developing the Google File System and MapReduce, which are the core of the Google app engine, it addressed this urgent need to 'parallelize the computation, distribute the data, and handle failures'. 25e Google File System is a proprietary scalable distributed file system, designed to run on inexpensive commodity hardware, be highly fault tolerant and able to process massive and expanding amounts of data. 26MapReduce establishes the computational paradigm for handling the processing and generation of Google's large data sets, comprised of raw data gathered from web crawling, web request logs, derived data summarising search queries, pages crawled and the graph structure of web documents.The paradigmatic breakthrough of MapReduce is in making practical the clustering of large numbers of commodity PCs for automatic parallel and distributed computation on a large scale. 27So it is in Google's proprietary environment that the new paradigm in which BSD would flourish was established.Just as with SETI@home, massive data calculations are broken into small chunks across many computers, and when completed are reassembled into a single dataset.This is the basic design behind Google's scores of proprietary, warehouse-sized computing facilities which operate like one giant mainframe.
By publishing key papers detailing their file system and MapReduce-albeit keeping their code a proprietary secret-others were able to develop the basic structure of the file system and processing.Hadoop, housed under the not-for-profit Apache Software Foundation, developed an open source implementation of Google File System and MapReduce.While Hadoop was built and is maintained by a global community of participants, there are myriad for profit organisations that run the framework for their own proprietary large distributed computation platforms.
Hadoop and these related companies provide the software and data processing systems that enable the distributed computing that transpires on 'the cloud.' Reckoning the competing definitions of the amorphous computing cloud recalls Joseph Conrad in Lord Jim: 'the simplest impossibility in the world; as, for instance, the exact description of the form of a cloud.' 28 Yet this brief material overview reveals several key elements that can be described, and which detail this paradigm shift as it relates to data motility.What has changed, and is important about the Hadoop cloud as a computing environment for BSD is i) the scalability of computing, ii) the new economics of storing data, iii) the ability to continuously question raw data, and iv) the emergence of raw data as a heterogeneous source for potentially endless aggregation.Amr Awadallah, a former Yahoo engineer and C.T.O. of Cloudera, a Hadoop-based private company, has cogently outlined these elements.The first depends upon the aforementioned distributed model.What must be stressed is the computational power that comes from cluster architecture; that is, when a large number of computers are networked to run as if they were a single system.A simple example demonstrates the exponential power of the cluster.Say the single hard disk of a commodity PC can process 1 gigabyte per second, and one server holds 12 disks, and a rack holds 20 servers; that is already a processing speed 240 times faster than the single PC.Now the average cluster holds six racks, making it 1,440 times the processing speed.If you move into the realm of large clusters, which big data and social media companies would typically deploy, you are suddenly processing 4.8 terabytes per second, some 48,000 times faster than a single PC.In practical terms, a large cluster can process in one second what would take 13 hours on a single PC. 29 In the simplest terms, the larger you scale up, the faster your processing speed.The computational power of the cluster architecture is a potential resource awaiting more widespread and non-corporate deployment, and could enable a more inclusive and distributed community-based access to BSD.
In addition to upwardly scalable processing speed is a new economics of storage costs.In 1980, it cost $193,000 to store one gigabyte of data; that would make one of today's 16 gigabyte flash keys worth just over $3 M. By 1989 it was $36,000 per gigabyte, down to $43 in 1999, and about six cents today. 30Whereas an older corporate computing paradigm operated on Return on Investment (ROI) as a function of the cost of storing that byte, now it is Return on Byte (ROB), and given the relative pittance for storage, the basic question is how much value is created from the data you collect?This key change in the materiality of data storage carries a straightforward new imperative: collect more data.Further, as Awadallah notes, this new economics of 'keeping the data alive' also underpins the third fundamental shift of retaining the 'original raw event data'. 31The cluster architecture, then, enables a new economy which maximises both the storage capacity and processing speed of data, and retains data in its original high-fidelity, unadulterated form for continuous future queries.In other words, structured, unstructured, and semi-structured data are always available in their original form.In the traditional RDBMS, raw data is moved from the storage-only to the computational grid, where it is converted into the required structured form for database processing.But it is extremely expensive to reverse the process and retrieve the original data for further processing.The Hadoop environment, however, makes no such distinction between storage and computation in its cluster architecture.Indeed, it requires no pre-defined schema or structure for its data, which can be taken from smart phones, RFIDs, or the internet and dropped into the Hadoop cloud.This flexibility greatly diminishes the former challenge of processing structured, unstructured and semi-structured data in the same environment. 32ite to the contrary, the heterogeneity of data becomes a potential virtue, insofar as it vastly widens the conditions of processing possibilities.With the imperative to collect more data built in to the material structure of a Hadoop environment, the ROB ratio becomes extremely attractive.That is because in straightforward economic terms, the original raw event data is now forever.The Hadoop structured cloud affords the cost-effective ability to store all forms of data now and process it later, and then process it again and again.The implications for BSD are significant.It means that data need no longer be considered a monolithic block for pre-determined processing, as was the case with most RDBMSs.It means an end to what is known as 'data exhaust'-the myriad forms of data which are stored temporarily and then deleted-will increasingly be a thing of the past.The archives of the digital human, as such, will continue to grow apace.The breadth and depth of the totality of BSD becomes in practice discrete data points wherein the possibilities for aggregation and analysis depends only on the imaginary of those querying the data.In this sense, surely it is critical that this questioning not be left exclusively in the realm of marketers.A very brief look suggests an avalanche of ideas, all designed primarily to increase our efficacy (read profitability) as consumers.
The material elements comprising data motility are highly conducive to the needs of capital.The 'Powered By' page on the Hadoop Wiki reads like a who's who of social media, e-commerce, advertising, marketing and broadly defined BSD-related companies. 33Yahoo runs Hadoop with over 40,000 nodes, including a single 4,500-node cluster.eBay runs it for search optimisation and research; Last.fm, and Spotify for data aggregation, reporting and analysis.Netflix also uses Hadoop to process the vast user-data it gathers from streaming programming, which it uses to integrate even more deeply consumption with production.Facebook runs the world's largest Hadoop cluster, about 100 petabytes and capable of ingesting 500 terabytes of new data every day. 34 We also use a version of Hadoop and Hive to run the business, including a lot of our analytics around optimising our products, generating reports for our third-party developers, who need to know how their applications are running on the site, and generating reports for advertisers, who need to know how their campaigns are doing.All of those analytics are driven off of Hadoop, HDFS, Hive and interfaces that we've developed for developers, internal data scientists, product managers and external advertisers. 35at Parikh highlights-optimising reports, generating app reports and reports for advertisers-are core practices of BSD analytics.The material infrastructure and practices we have been outlining are a necessary precondition for BSD analytics, be it as data mining, sentiment analysis, or predictive analysis.
These new core practices are extensions and intensifications of the kinds of surveillance strategies of data exploitation so comprehensively outlined by Andrejevic and Fuchs. 36While such data capture is manifest, the heterogeneity of the dispositif demands we consider BSD analytics as just one specific modality of data motility-that of contained mobility.For indeed, this data flows through corporate enclosures, in a manner not directed by the digital human who generated it.But in critically unpacking this contained data mobility we need to consider the breadth of the heterogeneous ensemble through which it flows, to discover other intentionalities and desires which may indicate more liberatory possibilities of data motility.
Acxiom is a little known but major American data broker which collects both consumer data, information from financial service companies, court records, and government documents.As recently outlined by the Electronic Freedom Foundation, 37 they have partnered with Facebook.For example, Facebook will identify a desired audience, say potential car buyers.Acxiom will then scour its databases and create a list of everyone who meets that criteria and provide it to Facebook.That list will then be delimited by Facebook to include only its users which in turn will be served up to the car manufacturer so it can effectively produce appropriate ads.Finally, Facebook will display that ad alongside the targeted user's newsfeed.There are a number of things worth emphasising in this example of data motility.For one, it highlights the ever-multiplying stages of motility, of the movement of the data we create but do not direct.First, the digital human generates the structured data of government records, financial documents and consumer behaviour.Second, this data moves from its initial database to those of Acxiom.Third, these discrete elements are moved again at the behest of Facebook, in aggregation by Acxiom.Fourth, they are collectively moved again to Facebook.
Fifth, they move from Facebook to the auto manufacturer.Sixth, the discrete points of data users once generated, now profoundly processed and aggregated, are pinged back to those same digital humans in the form of a targeted ad.Finally, user response to those targeted ads become a new source of BSD in a deep layer of recursivity: 'Facebook then provides the company with an aggregate report about how an ad performed, which might include information about how many people clicked on it, their locations, ages, genders, etc '. 38 This latter point leads to the next generation of BSD analytics that Facebook is unveiling, in a formal partnership with Acxiom, Datalogix, Epsilon and BlueKai.Acxiom, like Datalogix and Epsilon, has its own databases, culled from loyalty cards, purchase-based data and other comprehensive demographic databases.BlueKai, however, contributes uniquely to an even more heterogeneous and frictionless flow of BSD, specialising in tracking cookies which collect information about all the sites you visit when not on Facebook.Upon returning, an HTML pixel web bug enables Facebook to process the data about all the other sites you visited.This provides the social media giant with a comprehensive digital trace of your online predilections, which, in turn, can be analysed and aggregated with all the aforementioned data now in their proprietary grasp.This 'cookie matching' makes you even more valuable for advertisers who want to target you on Facebook.
In order to facilitate this next stage of heterogeneous BSD integration, Facebook has purchased Atlas, an ad-server formerly owned by Microsoft.As Advertising Age notes, this is a clear sign of Facebook's intention to an online ad server behemoth, second only to Google's DoubleClick. 39First, this new ad server will consolidate advertiser connection to Facebook's display tools and exchange, and, the subsequent measurement of onsite ad effectiveness.But given the increasingly integrated and heterogeneous flow of information and collection points, the quantification of effectiveness is no longer limited to whether or not you click on the targeted ad.Your consumer habits can now be tracked outside of your Facebook-based activities via the myriad databases of the the new array of partners, for example, via your general online habits or when you use your credit card.In turn, this can be analysed via forms of textual analysis of the user content you generate on Facebook.These kinds of sophisticated BSD analytics facilitate a particular kind of data motility which seek to quantify the affective sociality of advertising.That is, it hopes to measure not just your click through rate but the impression of ads.As Mark Zuckerberg stated to investors, the strategic intention is to 'help connect ad impressions and purchases'. 40ese specific material developments and configurations facilitate the evermore comprehensive capture of data for corporate purposes.There are also regulatory decisions and laws enabling a more frictionless flow between those companies and the state.For example, the US House of Representatives recently passed by a wide margin the Cyber Intelligence Sharing and Protection Act (CISPA).This bill would allow companies to monitor user actions that leave a digital trace and share it with the government, without a warrant and without ever needing to notify you that it possesses your data, regardless of how sensitive it might be.'This means a company like Facebook, Twitter, Google, or any other technology or telecoms company, including your cell service provider, would be legally able to hand over vast amounts of data to the U.S. government and its law enforcement-for whatever purpose it deems necessary-and face no legal reprisals'. 41Further, such state compulsion to share data without consent or knowledge would not be subject to the Freedom of Information Act which otherwise would enable the public to request the government to releasing information.It must be stressed that at the time of writing this bill remains in legislative limbo, with the US Senate refusing to vote on it due to concerns over insufficient privacy protection, and to political infighting resulting NSA revelations.
Nonetheless, there are other examples around the world.India has invoked the Central Monitoring System, which will allow the government and its agencies to monitor all telecommunications and Internet communications within that country.
According to The Centre for Internet & Society this enables a general environment of e-surveillance, establishing central and regional databases, allowing central and state law enforcement agencies to intercept and monitor communication, and undertake call data record analysis and data mining. 42e rise of such new regulations across the globe and the disturbing practices of NSA data capture and analysis indicate the need for critical debate around privacy in the age of BSD.Such new laws are justified by the purported need for cybersecurity.These are key issues in need of informed consideration but are beyond the scope of this article.Instead, I want to posit this less in discursive or ideological terms of security, and more as an effect of the material elements of the dispositif of data motility.Given the persistence and permanence of our broadly generated digital traces, and the material changes enabling the intensive and extensive processing of different forms of data, there should be no surprise that an ever-more frictionless flow becomes an urgent need for both the state and capital.These common interests are clearly visible on the surface.Lobbyists in favour of CISPA outspent opponents by 140 times, and include major tech, telecommunication and financial corporations, including AT&T, Comcast, Verizon, Time Warner Cable, National Cable and Telecommunications Association, Cellular Telecom & Internet Association, Oracle, Intel, IBM, American Bankers Association. 43To suggest that this comprises a cabal that planned and orchestrated this widespread and frictionless flow of BSD is to miss the point of a dispositif.Rather, look to the cohesion, the binding of strategic interests under the logic of data that can retain its information as it moves and it processed in myriad and ongoing iterations.
There is one more potential regulatory change that must be mentioned.The precise articulation of property rights calibrates the control exercised over the flow of data.Intellectual property law and user agreements are key regulations which guarantee the controlled flow of BSD through a highly proprietary environment.In a social media context, one owns the data one generates, insofar as a copy can be demanded from Facebook.That does nothing, however, to limit the secondary rights held by the social media giant which moves, mines, processes and aggregates your data at will.The status of data ownership in a cloud environment was brought further into question with the FBI-led case against Megaupload.When Megaupload's servers, holding about 25 petabytes of data, were unplugged last year, the data property rights of those utilising Kim Dotcom's services were seemingly abrogated.One such user, Kyle Goodwin, used Megaupload to store video and files for his small regional website that covers high school sports.He has to date unsuccessfully sought the retrieval of his data, and subsequently taken legal action, arguing that the US government, in its pursuit of Megaupload, had not taken reasonable steps to protect third-party property rights in cloud computing storage.The US government has strongly opposed Goodwin's efforts.According to Goodwin's lawyers, '[a]pparently your property rights "become severely limited" if you allow someone else to host your data under standard cloud computing arrangements'. 44Further, even if the governments position does not stand up to legal challenge, they have indicated they will implement administrative measures whereby the data would first need to be reviewed by the government or a third party to determine if any of it infringed copyright.It is worth noting that the Motion Picture Association of America has filed a brief as a non-party participant in the case, in support of that system. 45These examples, from Facebook and Megaupload demonstrate the prominence of data mobility as a modality of control, surveillance and profit, and cannot be underestimated.But what remains in the dispositif of data motility?
For a Data Debt Jubilee?
For the dispositif to be a sharp tool for critical analysis, its heterogeneity must be foregrounded, both in terms of its discursive and material elements, and in the differentiated power and knowledge relations it engenders.The Hadoop material structure does not necessitate a proprietary environment.It is the strategic interests of big data and social media companies that results in the parsing of data for a controlled flow.Yet there is nothing in the material environment of BSD which leaves it exclusively bound to an algorithmic power of profitable and productive control.Just as biopower's dispositif of control and domination must be differentiated from the biopolitical domain of creativity and resistance, a similar distinction must be made between data mobility and data motility.I suggest differentiating the proprietary environment as one of mobility, wherein the flow of data is motile vis-à-vis its being wholly autonomous of the control of those who generated it, but ultimately directed by social media and big data companies which calibrate its flow for maximum profitability.Indeed, it could be stated that the state and capital embrace controllable data mobility but fear and loathe autonomous data motility.
Let's go back to the material phenomenon which inspired the conceptualisation of BSD via the dispositif of data motility.One of the defining features of cloud architecture was the virtual disappearance of the physical boundaries containing your data.There are, of course very clear material boundaries that remain, but they can be literally distributed across the globe.As well, the cloud environment is typically a shared one, and the vicissitudes of data optimisation require a replication factor of at least three, meaning that each unique 'raw data event' is stored in at least three locations across the cloud.Further, this is dynamic data replication, so your 'raw data event' could be in the northern hemisphere one moment and in the southern the next.Finally, the movement of this data between geographically distributed data centres regularly happens with neither administrator knowledge nor consent.This is a structural glitch in the cloud wherein data moves autonomously, in a seeming act of self-generated movement.Motility is, above all else, autonomous movement.In this specific instance, data motility is a material expression of the cloud's architecture and code.One data security expert bemoaned this strange phenomenon whereby cloud-stored data moves of its own accord, complaining of 'the headaches that come from unruly and nomadic information'. 46The literal source of data motility, then, is strictly a material effect of cloud architecture.What is of greater critical analytical value and with Plato and Aristotle, calling into question that strict separation of the natural human and artificial technology.The digital human is a means for thinking of the human as always already being in a constitutive assemblage with technology.The specific material elements of that assemblage are of great importance, indicating the importance of historicising those mediating elements.
To suggest that the assemblage's material, technologically mediated elements-the nonhuman, as it were-have gained proper motility highlights what is unique about the digital human.Motility was also key to Hegelian logic wherein dialectics turn on the Being of life in its specific motility.Heidegger thought motility as constitutive of being as opposed to something that happens to being.
Here we should pause to think of the implications of originary technicity, wherein humanisation begins with the exteriorisation of memory into rudimentary stone tools.BSD, in this sense, is nothing but the exteriorisation of memory, of the quotidian, mediated actions of life.The motility of our BSD is not something that happens to us; it is constitutive of our being as digital humans.Keith Ansell Pearson provocatively reads Heideggerian motility in terms sympathetic to this perspective, positing a Deleuzian ethology, wherein it signifies the becoming of life but only ever in a deeply relational structure with 'environment'. 49Finally, Marcuse posits the motility of being as the historicising rootedness in the world, linking it to both labour, and radical acts of social, political and economic transformation.I put forth motility, then, because it denotes a potentially transformative becoming in a deeply recursive and historicised mediated environment.As such, data motility marks the tensions and struggles endemic to the digital human, and is in need of further critical inquiry.Just how the quotidian data that we generate moves autonomously of our control circumscribes the ontological ground of the digital human.In the space that remains, I want to suggest possible ontological implications made visible by the dispositif of data motility, specifically as it relates to life and labour under BSD.
If we return to the dispositif, then the assemblage of data motility resonates deeply with Ansell Pearson's reading of Heideggerian motility.The environment with which the becoming of life is relational, is one conducive to the generation and intensive processing of BSD.One interpretation is that the proprietary environment of control engendered by data motility is also one wherein Being is in a state of data encumbrance.Here we can turn to Lazzarato, noted already for explicating the polyvalent nature of the dispositif, through the example of biopower-biopolitics, which in turn I am applying to mobility-motility.In his recent book The Making of Indebted Man, Lazzarato extends his critique of 'immaterial labour' which denotes the increasingly prominent role of communicative sociality in the generation of capitalist value.His thesis is that the debtor-creditor relationship is the core of the neoliberal condition.I find this suggestive, particularly in terms of data motility.
Lazzarato contends that debt breaks down the binaries producer-consumer, and working-nonworking.He sees this as a radical extension of biopower wherein debt is a strategy of control, a rearticulation of its imperative 'become productive'.He posits 'indebted man' (sic) as the subjective figure of contemporary capitalism: 'Debt breeds, subdues, manufactures, adapts, and shapes subjectivity.What kind of subjectivity?' 50 Lazzarato situates debt as a correlate of Deleuzian control society, as opposed to the confinement of disciplinary society.This provides an interesting counterpoint to an important body of work which situates social networks and Web 2.0 environments in terms of data enclosure and confinement. 51But how might debt be applied to the material environment of BSD?One way is to see a command of encumbrance, a strategy of control exercised through the dispositif of data motility.Let us recall how the conflation of consumer-producer and work-nonwork are hallmarks of the social web, of social media, and of immaterial labour 2.0.The rise of BSD takes us further along this continuum, as a more extensive and intensive variant.
Let me try to nuance this claim, as a means of outlining an approach to further study.I suggest that BSD comprise the endless payments we make to neoliberal digital or cognitive capitalism.In order to access any social media platform, any element of Web 2.0, we must generate social data.It is structurally unavoidable, and the motility of that data is the means by which its sociality is turned into economic value.This renders BSD as a key modality for responding productively to the command of neoliberal debt.As Lazzarato emphasises throughout his recent work, debt encourages and compels us to become the 'entrepreneurs' of ourselves, as 'human capital'. 52The capital of the digital human is data.Data-as metadata and user-generated content-is highly productive for capital, given its strategy to buy it low and sell high.This dynamic of debt runs through the dispositif of data motility.Social media, be it Facebook, Twitter, Google, is on the surface free for users.In turn, content is generated for free.The entire business model of social media platforms turn on selling that data as profitably as possible.Hence the growing appeal of the Hadoop environment, of the intensive and ongoing processing of BSD.It is an environment structured to maximise data motility wherein data moves autonomously of your control from the moment you generate it.
Yet as already noted, data motility is not just a dispositif of control which harnesses the digital traces of life for work; it is equally one offering new political and economic opportunities for constituent power and resistance.Motile data is social data, and the sociality of that data highlights its polyvalence-the social and economic valorisation that underpin social media.Sociality is the driver of BSD.
These are new mediated cultural practices, and the resulting BSD is generated by social, communicative, and affective relations.They are transformed into economic relations, as noted by Worstall, the Fellow from the Adam Smith Institute (which, it should be recalled, was the intellectual force behind privatisation under Thatcher).The circulation, exchange and valuation of such interlinked social data is crucial to the expansion of neoliberal digital capital.Nonetheless, it is the sociality of data, not the strategies of its capture, that coheres the dispositif of data motility.
Attention to the materiality of the dispositif of data motility, further, indicates that he is right that 'we' are being 'sold' in social media.I find it far more interesting, however, to regard this not just as yet another normative capitalist relation, but as a new form of debt which encumbers the breadth and depth of our newly gained communicative and social capacity.When viewed this way, data motility signals concomitant possibilities of new digital commons and political action.
In this regard it seems nonsensical, as political strategy, to try and strip ourselves of BSD.There is a profound potentiality therein for expanded and intensified communicative and affective capacities.As Pybus notes, the archive of BSD, as a kind of archive of everyday life is not merely the sedimented part but also a liminal space. 53What seems intolerable is the prospect of it remaining a space for becoming a more profitable consumer, or a better surveilled subject.What a critical understanding of the dispositif of data motility helps clarify is that collective sociality comes before its capture by capital.Here we benefit from recalling that sociality drives the desiring assemblage of motility.This helps us think about ways to reject our data encumbrance and to reclaim what, after all, is ours.What might a radical embrace of data motility mean?What are the algorithmic codes that can create libidinal economies from a new data commons?What might a BSD commons look like?What kind of new sociality might emerge in critical projects of personal data curation?What are the political possibilities that data motility-which seems inherently deterritorialising-hold for, among other things, 'the exploit' about which Galloway and Thacker write so provocatively?
Lets give the last word to an emerging player in the Hadoop environment, Platfora, which was recently bolstered by major investment, including InQTel, the CIA's venture capital arm.Platfora seeks to make BSD open to real-time intuitive, and serendipitous analysis for its corporate clients.'Imagine what is possible…[when e]veryday business users can interactively explore, visualise and analyse any of that data immediately, with no waiting for an IT project.One question can lead to the next and take them anywhere through the data'. 54What might be possible if such Future research is necessary to comprehensively outline corporate Hadoop users and the specific forms of data analysis they perform.Here, I simply want to isolate a telling element of Facebook's BSD infrastructure.Again, following Foucault's imperative, I turn to the business press and quote at length Jay Parikh, Facebook's VP of Infrastructure Engineering: