# Bysort tab stata

By using our site, you acknowledge that you have read and understand our Cookie PolicyPrivacy Policyand our Terms of Service.

Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. I am using tabstat in Stata, and using estpost and esttab to get its output to LaTeX. I have. The question I have is whether there is a way for tabstat or other Stata commands to display the output ordered by the value of the mean, so that those categories that have higher means will be on top.

By default, Stata displays by alphabetical order of industry when I use tabstat. Make here is like your variable industry : it is a string variable, so in tables Stata will tend to show it in alphabetical alphanumeric order. Calculate a variable on which you want to sort. Map those values to a variable with distinct integer values. As two groups could have the same mean or other summary statisticmake sure you break ties on the original string variable. This variable is created to have value 1 for the group with the lowest mean or other summary statistic2 for the next lowest, and so forth.

If the opposite order is desired, as in this question, flip the grouping variable around. There is a problem with this new variable: the values of the original string variable, here Makeare nowhere to be seen.

## Announcement

We use the values of the original string variable as the value labels of the new variable. The idea is that the value labels become the "mask" that the integer variable wears. If you collapse your data to a new dataset, you can then sort it as you please.

I would look at the egenmore package on SSC. You can get that package by typing in Stata ssc install egenmore.

In particular, I would look at the entry for axis in the helpfile of egenmore. That contains an example that does exactly what you want. Learn more. Ask Question.

Asked 6 years ago. Active 1 year, 11 months ago. Viewed 6k times. I have tabstat to display statistics by group. For example, tabstat assets, by industry missing statistics count mean sd p25 p50 p75 The question I have is whether there is a way for tabstat or other Stata commands to display the output ordered by the value of the mean, so that those categories that have higher means will be on top.

Nick Cox Active Oldest Votes.This module will explore missing data in Stata, focusing on numeric missing data. It will describe how to indicate missing data in your raw data files, as well as how missing data are handled in Stata logical commands and assignment statements. We will illustrate some of the missing data properties in Stata using data from a reaction time study with eight subjects indicated by the variable idand the subjects reaction times were measured at three time points trial1, trial2 and trial3.

The input data file is shown below. You might notice that some of the reaction times are coded using a single. As a general rule, Stata commands that perform computations of any type handle missing data by omitting the row with the missing values. As you see in the output below, summarize computed means using 4 observations for trial1 and trial2 and 6 observations for trial3. In short, the summarize command performed the computations on all the available data.

A second example shows how the tabulation or tab1 command handles missing data. Like summarize, tab1 uses just available data. Note that the percentages are computed based on the total number of non-missing cases. It is possible that you might want the percentages to be computed out of the total number of observations, and the percentage missing for each variable shown in the table.

This can be achieved by including the missing option which can be shortened to m after the tabulation command. We would expect that it would perform the computations based on the available data and omit the missing values.

Here is an example command. The output is show below. Note how the missing values were excluded. Stata will perform listwise deletion and only display correlation for observations that have non-missing values on all variables listed. Stata also allows for pairwise deletion. Correlations are displayed for the observations that have non-missing values for each pair of variables.Login or Register Log in with.

Forums FAQ. Search in titles only. Posts Latest Activity. Page of 1. Filtered by:. Rodrigo Badilla. Problem with bysort and tabulate using asdoc 20 Nov Hi all, I wonder if its possible use asdoc with bysort: and tabulate example Code:. Tags: None. Attaullah Shah.

Presently, the bysort prefix does not work with tabulation commands. I shall work on it in the next update. In the meanwhile, one workaround is to use a loop and estimate the tabulation of the given variables for each distinct value of the bysort variable.

So Code:. Regards Attaullah Shah, PhD.

Comment Post Cancel. Dear Attaullah Shah, Thanks for you reply as always yours solutions works great! Just one question, its possible get percent in a side column? Regards and thanks in advance Rodrigo. Amanda Wyant. Is it possible to use asdoc with sum and bysort? Yes, it is possible.

For more tips and tricks on summary statistics with asdoc, you can watch this YouTube video Code:. Last edited by Attaullah Shah ; 20 May When I complete this command, only one of my Level2 variables shows up in the word document. Cannot tell much without seeing an example of your dataset and the code that you have used. You can post a sample of your data using dataex from SSC.

Abe Habeshaw. Originally posted by Attaullah Shah View Post. Dear Prof. Shah, thank you for writing this excellent program. I am using stata I need your help on a couple of issues.

I get the same error when I used my own data too.In management research, we usually need to create a variable that measures the experience of firms. Firms accumulate experience as they make acquisitions or invest in companies in certain countries. Sometimes this experience has an effect in future decisions, so we calculate variables that measure the number of times a firm has made an acquisition or has invested in a certain industry or country.

### How To / STATA: Calculate Variables for Groups of Observations

In this post I will calculate an experience variable using a fictitious dataset. Consider the dataset shown in the figure below Table 1. It has 4 variables: Firm, Country, Year, and Investments.

The dataset describes the amount of investments a Firm has made in a Country each Year. The variable Investments is non-negative. Table 1. Therefore, in order to create the variable ExperienceCountry column 5 in Table 2 we have to sum the investments a firm has made in a country from the first to the current year. We do this using the — bysort — and — gen — commands. The — bysort — command sort the observations using the variables Firm, Country, and Year.

The result of that order will be two groups of observations: Firm A and Firm B. Then, since we are sorting by Country, we will have two subgroups within each group: Brazil and Russia.

Finally, within each country, the observations will be sorted from the first to the last year. After sorting the observations we use — gen — to calculate the ExperienceCountry variable.

What we do is that for each row observation we sum the number of investments previously made, including those of the current observation. Thus, we get a 4 in row 3 because Firm A invested one time inone time in and twice in But why we write the variable Year between parenthesis?

This means that even though the — bysort — will sort the observations by the year, the operation performed after the — bysort — - gen — and — sum — in this case will be done by the groups formed by the values of the variables Firm and Country. In other words, if we write:.

We will be summing the investments made by each firm in each country during only each year.Special thanks to John Coglianese for feedback and for supplying the list of "vital" Stata commands. Feedback and requests for additions to the list are always welcome!

The official Pandas documentation includes a "Comparison with Stata" page which is another great resource. In Stata, you have one dataset in memory. Everything in Stata is built around this paradigm.

Python is a general purpose programming language where a "variable" is not a column of data.

### Stata to Python Equivalents

Variables can be anything, a single number, a matrix, a list, a string, etc. The Pandas package implements a kind of variable called a DataFrame that acts a lot like the single dataset in Stata. It is a matrix where each column and each row has a name. The key distinction in Python is that a DataFrame is itself a variable and you can work with any number of DataFrames at one time.

You can think of each column in a DataFrame as a variable just like in Stata, except that when you reference a column, you also have to specify the DataFrame. The Stata-to-Python translations below are written assuming that you have a single DataFrame called df. Python doesn't have "labels" built into DataFrames like Stata does. However, you can use a dictionary to map data values to labels when necessary.

There is no general equivalent to tsset in Python. However, you can accomplish most if not all of the same tasks using a DataFrame's index the row's equivalent of columns.

In Python and Pandas, a DataFrame index can be anything though you can also refer to rows by the row number; see. It can also be hierarchical with mutiple levels. It is a much more general tool than tsset. Merging with Pandas DataFrames does not require you to specify "many-to-one" or "one-to-many". Pandas will figure that out based on whether the variables you're merging on are unique or not.

However, you can specify what sub-sample of the merge to keep using the keyword argument howe. But this difference also makes reshaping a little easier in Python. Here Input 3 creates a DataFrame, Input 4 gives each of the index columns a name, and Input 5 names the columns.

Coming from Stata, it's a little weird to think of the column names themselves having a "name", but the columns names are just an index like the row names are. It starts to make more sense when you realize columns don't have to be strings. They can be integers, like years or FIPS codes. In those cases, it makes a lot of sense to give the columns a name so you know what you're dealing with. Input 6 does the reshaping using unstack 'time'which takes the index 'time' and creates a new column for every unique value it has.

Notice that the columns now have multiple levels, just like the index previously did. This is another good reason to label your index and columns. If you want to access either of those columns, you can do so as usual, using a tuple to differentiate between the two levels:. If you want to combine the two levels like Stata defaults toyou can simply rename the columns:. The pivot command can also be useful, but it's a bit more complicated than stack and unstack and is better to revisit pivot after you are comfortable working with DataFrame indexes and columns.

In Python, missing values are represented by a NumPy "not a number" object, np. In Stata, missing. In Python, np.Get a free blog at WordPress. Home About Index. Stata Daily. Collecting, organising, and analysing data is expensive but so is doing nothing. Is it necessary to put observations in a certain order? In a number of cases, yes. The most obvious case is when you are using the qualifier -in- to specify a subset in your data.

This is when -sort- and -gsort- come in handy. These two put the observations in a certain order. The -sort- command put the observations in ascending order based on a specific variable or a set of variables. The basic syntax for -sort- is: sort varlist If varlist is only one variable, then Stata will sort the observations in ascending order based on that variable.

If there are 2 variables, var1 and var2after sort, Stata will sort the observations according to var1 first. Then, for observations with common var1Stata will sort them according to var2. If there are more than 2 variables, then the observations will be sorted by the first variable first, then the second variable second, and so on. Stata is a registered trademark of StataCorp LP.For potential users coming from Stata this page is meant to demonstrate how different Stata operations would be performed in pandas.

As is customary, we import pandas and NumPy as follows. This means that we can refer to the libraries as pd and nprespectively, for the rest of the document.

Throughout this tutorial, the pandas DataFrame will be displayed by calling df. This is often used in interactive work e. Jupyter notebook or terminal — the equivalent in Stata would be:. A DataFrame in pandas is analogous to a Stata data set — a two-dimensional data source with labeled columns that can be of different types.

As will be shown in this document, almost any operation that can be applied to a data set in Stata can also be accomplished in pandas. A Series is the data structure that represents one column of a DataFrame. Every DataFrame and Series has an Index — labels on the rows of the data.

Stata does not have an exactly analogous concept. While using a labeled Index or MultiIndex can enable sophisticated analyses and is ultimately an important part of pandas to understand, for this comparison we will essentially ignore the Index and just treat the DataFrame as a collection of columns. Please see the indexing documentation for much more on how to use an Index effectively. A Stata data set can be built from specified values by placing the data after an input statement and specifying the column names.

A pandas DataFrame can be constructed in many different ways, but for a small number of values, it is often convenient to specify it as a Python dictionary, where the keys are the column names and the values are the data.

Like Stata, pandas provides utilities for reading in data from many formats. The tips data set, found within the pandas tests csv will be used in many of the following examples.

Stata provides import delimited to read csv data into a data set in memory. If the tips. Additionally, it will automatically download the data set if presented with a url. For example, if the data were instead tab delimited, did not have column names, and existed in the current working directory, the pandas command would be:. Pandas can also read Stata data sets in. These are all read via a pd. See the IO documentation for more details.

The inverse of import delimited in Stata is export delimited. Pandas can also export to Stata file format with the DataFrame.