WITH Salaries AS ( SELECT Salary, ROW_NUMBER() OVER(ORDER BY Salary DESC) AS 'RowNum' FROM Employee ) SELECT Salary FROM Salaries WHERE RowNum = 5
We can select random rows using NewID(), but the query can become expensive as the table becomes very large. Let's first see how to select random rows and then we'll improve the query for performance.
Selecting 10 Random Rows using NewID()
SELECT TOP 10 * FROM myTable ORDER BY NEWID()
The NewID() function is simple. It generates a pseudo-random uniqueidentifier every time it is run. If you execute the following statement, you will get a different result each time.
The uniqueidentifier's binary representation is 128 bits, which means that there can be in total 3.4 * 10^38 unique identifiers, which is a huge number and hence the probability of getting the same number twice is minuscule. So if we order the rows of a table by NewID() we would get random rows.
If you look at the query carefully you'll soon realise that the query has to assign a NewID() for each row. Then it has to sort by the uniqueidentifier (Sql Server would optimize this by stopping when it has found the top 10 rows, as it doesn't need to sort the rest). For a table with billions of rows it can get very slow.
Wouldn't it be nice if we could ask SQL Server for a random set of data rows, and then use NewID() method to sort the subset, which would be so much faster? Well we surely can ask Sql Server to do exactly that (SqlServer 2005 and above). Here's how we ask for a random page of 1000 rows.
SELECT * FROM myTable TABLESAMPLE (1000 ROWS)
Important: The above query wont exactly return 1000 Rows, but close to 1000.
The above query returns a random set from anywhere in the table, the rows inside the set still maintain the default order and are not random, so we will still have to use NewId() to randomize the above set. As NewId() will now work on a subset of table it would be much more efficient than the original query.
SELECT TOP 10 * FROM myTable TABLESAMPLE (1000 ROWS) ORDER BY NEWID()
One important point to keep in mind is that your results would be more localized in the second case as compared to the first non-optimized query as all rows though random would at max be 1000 rows away from each other. So select a sufficiently large value for TABLESAMPLE parameter to get more non-localized result. Also TABLESAMPLE can take as a parameter percentages. e.g. TABLESAMPLE(10 PERCENT)
varchar and nvarchar both are variable length storage columns (i.e. they only take up as much space as required as against char & nchar which take up fixed amount of space irrespective of data length).
The difference between varchar & nvarchar is nvarchar can store any Unicode data thus supporting text in all languages while a varchar column is limited to 8-bit code pages and cannot support text in languages which require unicode data.
So nvarchar can end up taking up double the space than that required for varchar.
However also note as most modern systems now use Unicode, using varchar may internally require conversion which could be expensive.
Local Temp Table
A local temporary table, #temptable, exists only for the duration of a user session or the procedure that created the temporary table. When the user logs off or when the procedure that created the table completes, the local temporary table is lost. Multiple users can't share a local temporary table because it is local to one user session. You also can't grant or revoke permissions on the local temporary table.
Global Temp Table
A global temporary table, ##temptable, also exists for the duration of a user session or the procedure that created the table. When the last user session that references the table disconnects, the global temporary table is lost. However, multiple users can access a global temporary table; in fact, all other database users can access it. But you can't grant or revoke permissions on the global temporary table because it's always available to everyone.
An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and nonclustered indexes can be composite indexes.
An index that ensures the uniqueness of each value in the indexed column.
A unique index is automatically created when you define a primary key or unique constraint:
Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key.
Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.
A type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.