Network effects as value drivers for online digital companies

This research develops valuation methodology of digital companies which exploit network effects. The main asset of companies of this type is their user base. The business model design makes the user base of many of the companies to be directly observable and measurable by any user in the network (wholly or at least partially through sampling). This creates an opportunity for market participants to get an in-depth understanding of the state of the business of the company. This research starts with a brief recap of the key characteristics of networks and dynamic processes on them. After that the most common business model patterns of network companies are mapped and analyzed using business model canvas. Having obtained the mechanics of the business and the qualities of networks of users the DCF valuations are conducted. The baseline DCF simulations use top-down approaches for projecting cash-flows, growth and risks and the test case simulations use network science based approaches. The last part of the research is devoted to empirical testing of the influence of the network effects on company pricing using cluster analysis and multiple regression techniques. The findings of this research are of a value to valuation practitioners, standard setters and IR departments. ​

Original document

Table of Contents:


Valuation of young growth companies is a challenging task – they not only don’t have a long story, but also often operate in new businesses. Moreover, business models of the new growth companies often exploit various network effects which makes the behavior of their financials non-linear and confusing to the market participants. As a consequence, the stock price of such type of companies is highly volatile and the probability of being valued wrong by the market increases since the market relies on a wrong set of signals – in the dotcom bubble it relied mostly on website visitors, and now it tends to rely on the quantity of users. Current company information disclosure standards do not allow to construct reliable user-based company valuation models. As a consequence, top-down aggregate approach to valuation has been commonly used mainly due to its simplicity given that disaggregate approaches to valuation and pricing gave comparable or higher uncertainty about the company value estimates while bringing in the model a huge number of assumptions about unit economics of the businesses.

This research is an attempt to extend the intrinsic value and pricing approaches to value a user in a network.  It will focus on a set of young growth companies which significantly rely on network effects in their business models and which have to expose information about networks of their users (or at least part of it) by design of their business models. Recent developments in the network theory and availability of cheap computing power and publicly available data may allow to increase the accuracy of DCF valuation models for this type of companies. For companies like Facebook, LinkedIn, Xing and other online social networks data about their user-base (or at least some part of it) is publicly available. For example, the head of data-analytics company Tazeros Global claimed that nowadays any student with a laptop can scrape the data of the whole online social network in one week (basic user profile data, friendships and posts) and the whole dataset of the Russian segment of Facebook would occupy less than 100 Gb of space (Hachuyan 2017). However, in order to assess the value of a network it might not be necessary to have a dataset which represents the whole network. Sometimes companies disclose datasets themselves when they run Hackathons and challenges from which they hope to crowdsource new algorithms to extract more value from their user data. In addition, the data obtained from various data breaches could also be used as a source for building such models regardless of the ethical and legal issues.

Facebook, LinkedIn and other existing online social networks have already mapped significant portion of the real-world social relationships and given that the Facebook has a user base which reaches 90% in some of the regions and countries it operates in it has become possible to use the user data to predict how other network companies business models would perform in such networks. Therefore, in order to build disaggregated user-based valuations of companies which don’t expose their networks to public like Uber or Lyft one could use Facebook’s or LinkedIn’s graphs enhanced with other information one can attach to it, or, alternatively, a simulated graph which reproduces the main qualities of the real-world network if interest.

It will be examined whether the complexity of bringing the network theory instruments in the valuation and pricing models adds value to them by exploring the tradeoff between additional complexity and the plausibility of assumptions vs. increased precision of the valuation.

This research consists of 3 chapters: chapter 1 is devoted to the basic qualities of networks and processes occurring on them, chapter 2 deals with methodological and practical aspects of valuation of online network companies. In the final chapter the companies’ disclosure standards are being analyzed using computer linguistic and statistical techniques. Findings and simulation algorithms developed as the result of this research will be of a value to researchers and practitioners of business valuation and allow to generate more accurate inputs for valuation models.

Chapter 1. Value drivers of a digital network company

1.1 Networks – key characteristics and network effects

For a proper bottom-up valuation of network companies it is necessary to briefly review the basic concepts and statistical tools of network theory. The network science is a relatively young discipline. This research relies on the systematizations in the field by Barabási and Newman. The following notation is used: a network is a set of nodes. Nodes of a network share connections that may take form of edges or hyperedges (Newman 2003). Edges represent connections between two nodes and may be either directed from one node to another or undirected, while hyperedges connect more than two nodes and are undirected (Newman 2003). Nodes may carry additional quantitative and qualitative parameters. Edges may be assigned with weights which allow to quantify the relationships among edges in the network model. The same nodes may simultaneously be included in several different networks. For example, in a real-life social network each person (node) has separate sets of professional and private contacts (edges) and, at the same time, is associated with various social groups and organizations (hyperedges). The notion of a node also depends on the design of the research model. Nodes may as well represent groups of people that share a common characteristic of interest (nationality, age, gender, location, income etc.), therefore, broadly, a node may be defined as an actor in a network and the edges and hyperedges – as relationships among those actors. The same real-life network may be modeled in several ways: 1. using only nodes and edges or 2. using nodes, edges and hyperedges. In the first case the nodes would represent a complex object: for example, a group of people living in the same city, country etc. If one deems necessary for the nodes to represent minimal unit of the network then a different model would be constructed: nodes would represent individuals instead of groups, edges would bear the same function as in the previous case, and hyperedges would embed in the model the grouping by city, country, etc.. The use of hyperedges brings multidimensionality in the model and makes the analysis more complex. However, multidimensionality of a network allows to account for all the necessary factors simultaneously while in the network models that use only nodes and edges are focused on a particular parameter which is used to group the atoms of a network to make up a node. Another way to avoid using hyperedges is to use nodes of different types (movies and actors, chemical elements and nutrition products etc.). The networks which include several types of nodes are called multipartite (Barabási 2016). In some networks nodes may have more than one edge between each other and this type of graph is called multigraph. In some network models nodes may be allowed to interact with themselves which is represented by self-loops – edges, which are connected only to one node. The choice of the network model has to be aligned with the research objectives.  This research is focused on online social network companies, therefore (unless specified otherwise) the nodes represent people and edges/hyperedges represent relationships among the people.

The qualities of simple networks with a low number of nodes and edges may be analyzed using graphical visualizations, that allow to explore all possible paths in networks and get exact values for various parameters that characterize a network (Newman 2003). However, real-world networks are much more complex. They are made up by billions of nodes and edges/hyperedges consequently analysis of their structure and processes which occur in them is only possible through the use of the graph theory and statistical techniques (Newman 2003).

Nodes, edges, hyperedges of networks possess quantitative characteristics. The number of connections of a node (ki) is a degree of a node. Total number of connections in a network (L) is ½ of the sum of degrees of all the nodes in the network (Eq. 1.1).

Formula - Total number of connections in a network  (Barabási 2016)(1.1)

For any network the total number of connections lies between 0 and   determined by Eq. 1.2.

(Barabási 2016)(1.2)

If the network has maximum number of connections it may possibly have – every node is connected to every other node in the network, it is called a complete graph.

One of the measures of interconnectedness of a network is the average degree  of a network:

(Barabási 2016)(1.3)

The average degree in a network with N nodes may vary from 0 for the networks with no edges which are called empty (or null) graphs to   for complete graphs in which every node has connections with all other nodes.

Another measure of interconnectedness of a network is the network density, which is the ratio of the edges present in the network to the maximum potential number of edges the network can have given its number of nodes. Network density can be calculated using the formula:

(Barabási 2016)(1.4)

Edges in a graph may represent one-way relationships and make up a directed graph. In directed graphs degree of a node equals to the sum of its incoming degree (nodes that point to the node) and outgoing degree (number of nodes that the node is pointing to).

Degree distribution is one of the key properties of networks. In graph theory it is described by a normalized probability function which expresses the dependency of the probability of a randomly chosen node in a network to have a degree equal to k.

(Barabási 2016)(1.5)
(Barabási 2016)(1.6)

The three possible extremes in degree distributions are illustrated on the Figures 1-3 using networkx package for Python (Hagberg, Swart and Chult 2008). 

Figure 1. Degree histogram Path graph


Figure 2. Degree histogram Star graph

Figure 3. Degree histogram
Complete graph

The first possible extreme is a path graph – all nodes have a degree of two and two nodes at the ends of the graph have a degree of one. The second one is a “star” graph – all nodes are connected to one hub, therefore all nodes have degree equal to one and the hub’s degree is equal to nine. The third possible extreme is a complete graph – all nodes are connected to all other nodes and have a degree of 9 (N-1).

    Network path is a chain of edges from one node to another. The path length is equal to number of edges it contains. The shortest path between two nodes is the one which has the least number of edges. In a directed network the existence of a path from a node x to a node y does not guarantee the existence of a path from node the y to the node x. Network diameter is the longest path that exists in a network. In small networks the network diameter is easy to calculate, while in large networks with millions of nodes this becomes an non-trivial task for which various algorithms have been developed: breadth-first search (Barabási 2016), navigation algorithm (Liu et al. 2019), Dijkstra’s algorithm (Dijkstra 1959) and others. The average of all shortest paths between all pairs of nodes is called average path length of a network.

Networks may be composed of several sub-networks which are not connected to each other. These sub-networks are called connected components.  For each connected component clustering coefficients of nodes reflect how even connections are spread. The local clustering coefficient may be calculated using the formula:

(Barabási 2016)(1.7)

where  is the sum of degrees of the nodes which are connected to the node i. The value of the clustering coefficient lies between 0 (the neighbors of the node i don’t share connections with each other) and 1(the neighbors of the node i are all interconnected and form a complete graph). The local clustering coefficient may also be viewed as a local network density measure (Barabási 2016). The clustering of the whole network can be measured by calculating average clustering coefficient by the formula:

(Barabási 2016)(1.8)

Uneven clustering results in emergence of locally dense structures in a network which are called communities. It is possible to define communities in two ways – strong and weak. Strong communities are those in which each of the nodes has more connections with other community members than with the rest of the network. Weak communities are subgraphs in which internal to external cumulative degree ratio is greater than 1 (Barabási 2016). Community detection is a very complicated task since in order to be able to detect the best partitioning one has to analyze all possible combinations of communities in a graph which grows exponentially to the size of a network. Instead, there are two types of algorithms which can be used to complete the community detection task in feasible time with a certain probability of the best slicing of the network: agglomerative and divisive algorithms. Agglomerative algorithms find similarities in groups of nodes and merge them into communities (for example, Ravasz algorithm (Ravasz et al. 2002) or link clustering algorithm (Ahn, Bagrow and Lehmann 2010, Evans and Lambiotte 2009)). Divisive algorithms, like Girvan-Newman Algorithm (Girvan and Newman 2002,  2004), are meant to detect the least similar connections and remove eliminate them to obtain communities. The criteria of similarity vary from one algorithm to another. The community partition problem can also be approached by clustering links instead of nodes based on their topological similarity (Barabási 2016). The main application of the community identification algorithms is identification of customer groups and their interests in order to align marketing and sales strategies of companies.

Network density, average degree, average shortest path length, network diameter and average clustering coefficient are the basic descriptive statistics for a graph. In Table 1 these statistics are computed for the 3 sample 10-node graphs pictured on Figures Figure 1Figure 3. In the Table 2 the same statistics have been computed for the research datasets for three popular online social networks – (Facebook, Twitter, LifeJournal). Figures Figure 4-Figure 6 contain their graphical representation and degree distribution histograms.

Table 1. Descriptive statistics of the 3 basic 10-node graphs


Path graph
(Figure 1)

Star graph
(Figure 2)

Complete graph (Figure 3)

Number of edges




Network density




Average degree




Average shortest path length




Network diameter




Average clustering coefficient





Figure 4. Degree histogram Facebook sample Graph