常用的计量经济学Python & Stata命令对比汇总
2023/1/3 17:45:31 阅读:149 发布者:
来源:综合整理自:Stata to Python Equivalents
http://www.danielmsullivan.com/pages/tutorial_stata_to_python.html
1、数据输入输出
简介 | Stata | Python |
log工作日志 | log using | Python doesn't display results automatically like Stata. You have to explicitly call the print function. Using a Jupyter notebook is the closest equivalent. |
帮助文件 | help | help() OR? in IPython (as in pd.read_stata?) |
定义修改工作路径。 | cd some/other/directory | import os``os.chdir('some/other/directory')but this is bad practice. Better practice is to use full pathnames whenever possible. |
使用导入数据。 | use my_file | import pandas as pd``df = pd.read_stata('my_file.dta') |
读取数据 | use var1 var2 using my_file | df = pd.read_stata('my_file.dta', columns=['var1', 'var2']) |
导入excel表格数据。 | import excel using | df = pd.read_excel('') |
导入csv格式数据。 | import delimited using my_file.csv | df = pd.read_csv('my_file.csv') |
保存数据 | save my_file, replace | df.to_stata('my_file.dta') ORdf.to_pickle('my_file.pkl') for Python-native file type. |
输出数据 | outsheet using my_file.csv, comma | df.to_csv('my_file.csv') |
导出数据 | export excel using | df.to_excel('') |
2、数据管理
简介 | Stata | Python |
保留 | keep if | df = df[] |
保留变量a大于7 | keep if a > 7 | df = df[df['a'] > 7] |
删除 | drop if | df = df[~()] where ~ is the logical negation operator in pandas and numpy (and bitwise negation for Python more generally). |
| keep if _n == 1 | df.first() ORdf.iloc[0, :] Python is a 0-indexed language, so when counting the elements of lists and arrays, you start with 0 instead of 1. |
| keep if _n == _N | df = df.last() ORdf = df.iloc[-1, :] |
| keep if _n == 7 | df = df.iloc[6, :] (Remember to count from 0) |
| keep if _n <= 10 | df = df.iloc[:9, :] (Remember to count from 0) |
保留变量 | keep var | df = df['var'] |
保留变量var1 var2 | keep var1 var2 | df = df[['var1', 'var2']] |
保留变量varstem开头的 | keep varstem* | df = df.filter(like='varstem') |
删除变量var | drop var | del df['var'] ORdf = df.drop('var', axis=1) |
删除变量var1 var2 | drop var1 var2 | df = df.drop(['var1', 'var2'], axis=1) |
删除变量varstem开头的 | drop varstem* | df = df.drop(df.filter(like='varstem*').columns, axis=1) |
3、数据统计分析
简介 | Stata | Python |
描述 | describe | df.info() OR df.dtypes just to get data types. Note that Python does not have value labels like Stata does. |
描述 | describe var | df['var'].dtype |
计数 | count | df.shape[0] ORlen(df). Here df.shape returns a tuple with the length and width of the DataFrame. |
| count if | df[].shape[0] OR().sum() if the condition involves a DataFrame, e.g., (df['age'] > 2).sum() |
对变量var进行summ | summ var | df['var'].describe() |
| summ var if | df[]['var'].describe() ORdf.loc[, 'var'].describe() |
| summ var [aw = ] | Right now you have to calculate weighted summary stats manually. There are also some tools available in the Statsmodels package. |
| summ var, d | df['var'].describe() plus df['var'].quantile([.1, .25, .5, .75, .9]) or whatever other statistics you want. |
列联表分析var | tab var | df['var'].value_counts() |
4、面板数据
简介 | Stata | Python |
面板数据设定 | tsset panelvar timevar | df = df.set_index(['panelvar', 'timevar']) |
滞后一期 | L.var | df['var'].shift() NOTE: The index must be correctly sorted for shift to work the way you want it to. You will also probably need to use a groupby; see below. |
滞后2期 | L2.var | df['var'].shift(2) |
| F.var | df['var'].shift(-1) |
5、计量经济学模型操作命令
Stata | Python |
ttest var1, by(var2) | from scipy.stats import ttest_ind``ttest_ind(array1, array2) |
xi: i.var | pd.get_dummies(df['var']) |
i.var2#c.var1 | pd.get_dummies(df[var2]).multiply(df[var1]) |
reg yvar xvar if , r | import econtools.metrics as mt``results = mt.reg(df[], 'yvar', 'xvar', robust=True) |
reg yvar xvar if , vce(cluster cluster_var) | results = mt.reg(df[], 'yvar', 'xvar', cluster='cluster_var') |
areg yvar xvar1 xvar2, absorb(fe_var) | results = mt.reg(df, 'yvar', ['xvar1', 'xvar2'], fe_name='fe_var') |
predict newvar, resid | newvar = results.resid |
predict newvar, xb | newvar = results.yhat |
_b[var], _se[var] | results.beta['var'], results.se['var'] |
test var1 var2 | results.Ftest(['var1', 'var2']) |
test var1 var2, equal | results.Ftest(['var1', 'var2'], equal=True) |
lincom var1 + var2 | econtools.metrics.f_test with appropriate parameters. |
ivreg2 | econtools.metrics.ivreg |
outreg2 | econtools.outreg |
reghdfe | None (hoping to add it to Econtools soon). |
转自:“经管学苑”微信公众号
如有侵权,请联系本站删除!