Package org.opensextant.util
Class FileUtility
- java.lang.Object
-
- org.opensextant.util.FileUtility
-
public class FileUtility extends java.lang.Object
- Author:
- ubaldino
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
APP_MIMETYPE
static java.lang.String
AUD_MIMETYPE
static java.lang.String
COMMENT_CHAR
Char used in config files, dict files.static java.lang.String
DATA_MIMETYPE
static java.lang.String
default_encoding
static java.lang.String
DOC_MIMETYPE
static java.lang.String
FEED_MIMETYPE
static char
FILENAME_REPLACE_CHAR
Char to use in place of special chars when scrubbing filenames.static java.lang.String
FOLDER_MIMETYPE
static java.lang.String
GIS_MIMETYPE
static java.lang.String
IMAGE_MIMETYPE
static java.lang.String
MESSAGE_MIMETYPE
static java.lang.String
NOT_AVAILABLE
static java.lang.String
SPREADSHEET_MIMETYPE
static java.lang.String
VID_MIMETYPE
static java.lang.String
WEBARCHIVE_MIMETYPE
static java.lang.String
WEBPAGE_MIMETYPE
-
Constructor Summary
Constructors Constructor Description FileUtility()
-
Method Summary
Modifier and Type Method Description static java.lang.String
filenameCleaner(java.lang.String fname)
Another utility to deal with unicode in filenamesstatic java.lang.String
generateUniqueFilename(java.lang.String F, java.lang.String Ext)
Generate some filename with a unique date/time stampstatic java.lang.String
generateUniquePath(java.lang.String D, java.lang.String F, java.lang.String Ext)
Generate some path with a unique date/time stampstatic java.lang.String
getBasename(java.lang.String p, java.lang.String ext)
get the base name of a file, given any file extension.static java.lang.String
getFileDescription(java.lang.String url)
Get a plain language name of the type of file.static java.io.FilenameFilter
getFilenameFilter(java.lang.String ext)
Simple filterstatic java.io.InputStreamReader
getInputStream(java.io.File f, java.lang.String enc)
static java.io.InputStreamReader
getInputStream(java.lang.String fname, java.lang.String enc)
static java.io.InputStreamReader
getInputStreamReader(java.io.File f, java.lang.String enc)
Getting an input stream from a file.static java.io.OutputStreamWriter
getOutputStream(java.lang.String fname, java.lang.String enc)
Caller is responsible for write flush, close, etc.static java.io.OutputStreamWriter
getOutputStream(java.lang.String fname, java.lang.String enc, boolean append)
Caller is responsible for write flush, close, etc.static java.io.File
getParent(java.io.File f)
static java.io.File
getSafeDir(java.io.File dir, java.lang.String dupeMarker, int maxDups)
Get a directory that does not conflict with an existing directory.static java.io.File
getSafeFile(java.io.File f, java.lang.String dupeMarker, int maxDups)
static java.lang.String
getValidFilename(java.lang.String path)
On occasion file path may contain unicode chars, however as the is encoded, it may not be decodable by OS/FS.static boolean
isArchiveFile(java.lang.String filepath)
Check if a file is an archivestatic boolean
isArchiveFileType(java.lang.String ext)
Allow checking of a file extention; NO prefix "."static boolean
isAudio(java.lang.String filepath)
Checks file extension of given filepath to see if the format is a known audio type.static boolean
isImage(java.lang.String filepath)
Using Commons getExtension(), determine if the filename represents an image media type.static boolean
isJSONGzip(java.lang.String path)
Tell if the file is JSON/Gzipstatic boolean
isPlainText(java.lang.String filepath)
Test is a path or file extension ends with .txt NPE if null is passed in.static boolean
isSpreadsheet(java.lang.String filepath)
Simple check if a file is typed as a Spreadsheet Tab-delimited .txt files or .dat files may be valid spreadsheets, however this method does not look inside files.static boolean
isVideo(java.lang.String filepath)
Checks file extension of given filepath to see if the format is a known video type.static boolean
isWebURL(java.lang.String link)
Check if path or URL is a webpage.static boolean
isWindowsSystem()
A way of determining OS Beware, OS X has Darwin in its full OS name.static java.util.Set<java.lang.String>
loadDict(java.io.InputStream io, boolean case_sensitive)
The do all method.static java.util.Set<java.lang.String>
loadDictionary(java.io.File resourcepath, boolean case_sensitive)
Load a word list from a file path.static java.util.Set<java.lang.String>
loadDictionary(java.lang.String resourcepath, boolean case_sensitive)
A generic word list loader.static java.util.Set<java.lang.String>
loadDictionary(java.net.URL resourcepath, boolean case_sensitive)
A generic word list loader.static boolean
makeDirectory(java.io.File testDir)
Utility for making dirsstatic boolean
makeDirectory(java.lang.String dir)
Utility for making dirsprotected static char
normalizeFilenameChar(char c)
Tests for valid filename chars for simple normalization A-Z, a-z, _-, 0-9,static java.lang.String
readFile(java.io.File filepath)
static java.lang.String
readFile(java.io.File fileinput, java.lang.String enc)
Slurps a text file into a string and returns the string.static java.lang.String
readFile(java.lang.String filepath)
static java.lang.String
readGzipFile(java.lang.String filepath)
static boolean
removeDirectory(java.io.File directory)
Java oddity - recursive removal of a directorystatic boolean
writeFile(java.lang.String buffer, java.lang.String fname)
Write file, UTF-8 is default charset here.static boolean
writeFile(java.lang.String buffer, java.lang.String fname, java.lang.String enc, boolean append)
static boolean
writeGzipFile(java.lang.String text, java.lang.String filepath)
-
-
-
Field Detail
-
default_encoding
public static final java.lang.String default_encoding
- See Also:
- Constant Field Values
-
FILENAME_REPLACE_CHAR
public static final char FILENAME_REPLACE_CHAR
Char to use in place of special chars when scrubbing filenames.- See Also:
- Constant Field Values
-
COMMENT_CHAR
public static final java.lang.String COMMENT_CHAR
Char used in config files, dict files.- See Also:
- Constant Field Values
-
IMAGE_MIMETYPE
public static final java.lang.String IMAGE_MIMETYPE
- See Also:
- Constant Field Values
-
DOC_MIMETYPE
public static final java.lang.String DOC_MIMETYPE
- See Also:
- Constant Field Values
-
MESSAGE_MIMETYPE
public static final java.lang.String MESSAGE_MIMETYPE
- See Also:
- Constant Field Values
-
APP_MIMETYPE
public static final java.lang.String APP_MIMETYPE
- See Also:
- Constant Field Values
-
VID_MIMETYPE
public static final java.lang.String VID_MIMETYPE
- See Also:
- Constant Field Values
-
AUD_MIMETYPE
public static final java.lang.String AUD_MIMETYPE
- See Also:
- Constant Field Values
-
FOLDER_MIMETYPE
public static final java.lang.String FOLDER_MIMETYPE
- See Also:
- Constant Field Values
-
FEED_MIMETYPE
public static final java.lang.String FEED_MIMETYPE
- See Also:
- Constant Field Values
-
DATA_MIMETYPE
public static final java.lang.String DATA_MIMETYPE
- See Also:
- Constant Field Values
-
WEBARCHIVE_MIMETYPE
public static final java.lang.String WEBARCHIVE_MIMETYPE
- See Also:
- Constant Field Values
-
WEBPAGE_MIMETYPE
public static final java.lang.String WEBPAGE_MIMETYPE
- See Also:
- Constant Field Values
-
SPREADSHEET_MIMETYPE
public static final java.lang.String SPREADSHEET_MIMETYPE
- See Also:
- Constant Field Values
-
NOT_AVAILABLE
public static final java.lang.String NOT_AVAILABLE
- See Also:
- Constant Field Values
-
GIS_MIMETYPE
public static final java.lang.String GIS_MIMETYPE
- See Also:
- Constant Field Values
-
-
Method Detail
-
writeFile
public static boolean writeFile(java.lang.String buffer, java.lang.String fname) throws java.io.IOException
Write file, UTF-8 is default charset here.- Parameters:
buffer
- text to savefname
- name of file to save- Returns:
- status true if file was written
- Throws:
java.io.IOException
- if file had IO errors.
-
writeFile
public static boolean writeFile(java.lang.String buffer, java.lang.String fname, java.lang.String enc, boolean append) throws java.io.IOException
- Parameters:
buffer
- text to savefname
- name of file to saveenc
- text encodingappend
- if you wish to add to existing file.- Returns:
- status if written
- Throws:
java.io.IOException
- if file had IO errors.
-
getOutputStream
public static java.io.OutputStreamWriter getOutputStream(java.lang.String fname, java.lang.String enc, boolean append) throws java.io.IOException
Caller is responsible for write flush, close, etc.- Parameters:
fname
- file pathenc
- encodingappend
- true = append data to existing file.- Returns:
- stream writer
- Throws:
java.io.IOException
- if stream could not be opened
-
getOutputStream
public static java.io.OutputStreamWriter getOutputStream(java.lang.String fname, java.lang.String enc) throws java.io.IOException
Caller is responsible for write flush, close, etc.- Parameters:
fname
- file nameenc
- text encoding- Returns:
- stream writer
- Throws:
java.io.IOException
- if stream could not be openeed
-
getInputStreamReader
public static java.io.InputStreamReader getInputStreamReader(java.io.File f, java.lang.String enc) throws java.io.IOException
Getting an input stream from a file.- Parameters:
f
- file objectenc
- encoding of text data- Returns:
- reader
- Throws:
java.io.IOException
- if failure reading file or using encoding.
-
getInputStream
public static java.io.InputStreamReader getInputStream(java.lang.String fname, java.lang.String enc) throws java.io.IOException
- Throws:
java.io.IOException
-
getInputStream
public static java.io.InputStreamReader getInputStream(java.io.File f, java.lang.String enc) throws java.io.IOException
- Throws:
java.io.IOException
-
isSpreadsheet
public static boolean isSpreadsheet(java.lang.String filepath)
Simple check if a file is typed as a Spreadsheet Tab-delimited .txt files or .dat files may be valid spreadsheets, however this method does not look inside files.- Parameters:
filepath
- path to file- Returns:
- true if file represents one of the various spreadsheet file formats
-
isImage
public static boolean isImage(java.lang.String filepath)
Using Commons getExtension(), determine if the filename represents an image media type.- Parameters:
filepath
- path to file- Returns:
- if file represents any type of image
-
isVideo
public static boolean isVideo(java.lang.String filepath)
Checks file extension of given filepath to see if the format is a known video type.- Parameters:
filepath
- file name or path- Returns:
- true if file is likely an video file format.
-
isAudio
public static boolean isAudio(java.lang.String filepath)
Checks file extension of given filepath to see if the format is a known audio type.- Parameters:
filepath
- file name or path- Returns:
- true if file is likely an audio file format.
-
isArchiveFile
public static boolean isArchiveFile(java.lang.String filepath)
Check if a file is an archive- Parameters:
filepath
- path to file- Returns:
- boolean true if file ends with .zip, .tar, .tgz, .gz (includes .tar.gz)
-
isArchiveFileType
public static boolean isArchiveFileType(java.lang.String ext)
Allow checking of a file extention; NO prefix "."- Parameters:
ext
- extension to test- Returns:
- boolean true if file ends with .zip, .tar, .tgz, .gz (includes .tar.gz)
-
isPlainText
public static boolean isPlainText(java.lang.String filepath)
Test is a path or file extension ends with .txt NPE if null is passed in.- Parameters:
filepath
- path or extension, including "."- Returns:
- true if is .txt or .TXT
-
readFile
public static java.lang.String readFile(java.lang.String filepath) throws java.io.IOException
- Parameters:
filepath
- path to file- Returns:
- buffer from file
- Throws:
java.io.IOException
- on error
-
readFile
public static java.lang.String readFile(java.io.File filepath) throws java.io.IOException
- Parameters:
filepath
- path to file- Returns:
- buffer from file
- Throws:
java.io.IOException
- on error
-
readFile
public static java.lang.String readFile(java.io.File fileinput, java.lang.String enc) throws java.io.IOException
Slurps a text file into a string and returns the string.- Parameters:
fileinput
- file objectenc
- text encoding- Returns:
- buffer from file
- Throws:
java.io.IOException
- on error
-
readGzipFile
public static java.lang.String readGzipFile(java.lang.String filepath) throws java.io.IOException
- Parameters:
filepath
- path to file- Returns:
- text buffer, UTF-8 decoded
- Throws:
java.io.IOException
- on error
-
writeGzipFile
public static boolean writeGzipFile(java.lang.String text, java.lang.String filepath) throws java.io.IOException
- Parameters:
text
- buffer to writefilepath
- path to file- Returns:
- status true if file was written
- Throws:
java.io.IOException
- on error
-
makeDirectory
public static boolean makeDirectory(java.io.File testDir) throws java.io.IOException
Utility for making dirs- Parameters:
testDir
- dir to test- Returns:
- if directory was created or if it already exists
- Throws:
java.io.IOException
- if testDir was not created
-
makeDirectory
public static boolean makeDirectory(java.lang.String dir) throws java.io.IOException
Utility for making dirs- Parameters:
dir
- dirPath- Returns:
- if directory was created or if it already exists
- Throws:
java.io.IOException
- if testDir was not created
-
removeDirectory
public static boolean removeDirectory(java.io.File directory)
Java oddity - recursive removal of a directory- Parameters:
directory
- dir to remove- Returns:
- if all contents and dir itself was removed.
-
generateUniquePath
public static java.lang.String generateUniquePath(java.lang.String D, java.lang.String F, java.lang.String Ext)
Generate some path with a unique date/time stamp- Parameters:
D
- directoryF
- filenameExt
- file extension- Returns:
- unique path
-
generateUniqueFilename
public static java.lang.String generateUniqueFilename(java.lang.String F, java.lang.String Ext)
Generate some filename with a unique date/time stamp- Parameters:
F
- filenameExt
- file extension- Returns:
- unique filename
-
getParent
public static java.io.File getParent(java.io.File f)
- Parameters:
f
- the file in question.- Returns:
- the parent File of a given file.
-
getFilenameFilter
public static java.io.FilenameFilter getFilenameFilter(java.lang.String ext)
Simple filter- Parameters:
ext
- the extension to filter on- Returns:
- filename filter
-
getBasename
public static java.lang.String getBasename(java.lang.String p, java.lang.String ext)
get the base name of a file, given any file extension. This will find the right-most instance of a file extension and return the left hand side of that as the file basename. commons io FilenameUtils says nothing about arbitrarily long file extensions, e.g., file.a.b.c.txt split into ("file" + "a.b.c.txt")- Parameters:
p
- pathext
- extension- Returns:
- basename of path, less the extension
-
getValidFilename
public static java.lang.String getValidFilename(java.lang.String path)
On occasion file path may contain unicode chars, however as the is encoded, it may not be decodable by OS/FS.- Parameters:
path
- path to normalize- Returns:
- filename
-
filenameCleaner
public static java.lang.String filenameCleaner(java.lang.String fname)
Another utility to deal with unicode in filenames- Parameters:
fname
- name to clean- Returns:
- cleaner filenname
-
getSafeDir
public static java.io.File getSafeDir(java.io.File dir, java.lang.String dupeMarker, int maxDups)
Get a directory that does not conflict with an existing directory. Returns null if that is not possible within the maxDups.- Parameters:
dir
- directorydupeMarker
- incrementormaxDups
- max incrementor- Returns:
- file object
-
getSafeFile
public static java.io.File getSafeFile(java.io.File f, java.lang.String dupeMarker, int maxDups)
- Parameters:
f
- file objdupeMarker
- incrementormaxDups
- max incrementor- Returns:
- new file
-
normalizeFilenameChar
protected static char normalizeFilenameChar(char c)
Tests for valid filename chars for simple normalization A-Z, a-z, _-, 0-9,- Parameters:
c
- character to allow- Returns:
- given character or replacement char
-
isWindowsSystem
public static boolean isWindowsSystem()
A way of determining OS Beware, OS X has Darwin in its full OS name.- Returns:
- if OS is windows-based
-
loadDictionary
public static java.util.Set<java.lang.String> loadDictionary(java.lang.String resourcepath, boolean case_sensitive) throws java.io.IOException
A generic word list loader.- Parameters:
resourcepath
- classpath location of a resourcecase_sensitive
- if terms are loaded with case preserved or not.- Returns:
- Set containing unique words found in resourcepath
- Throws:
java.io.IOException
- on error, resource does not exist
-
loadDictionary
public static java.util.Set<java.lang.String> loadDictionary(java.net.URL resourcepath, boolean case_sensitive) throws java.io.IOException
A generic word list loader.- Parameters:
resourcepath
- classpath location of a resourcecase_sensitive
- if terms are loaded with case preserved or not.- Returns:
- Set containing unique words found in resourcepath
- Throws:
java.io.IOException
- on error, resource does not exist
-
loadDict
public static java.util.Set<java.lang.String> loadDict(java.io.InputStream io, boolean case_sensitive) throws java.io.IOException
The do all method. Load the dictionary from stream This closes the stream when done.- Parameters:
io
- streamcase_sensitive
- true if data should be loaded preserving case- Returns:
- set of phrases from file.
- Throws:
java.io.IOException
- on IO error
-
loadDictionary
public static java.util.Set<java.lang.String> loadDictionary(java.io.File resourcepath, boolean case_sensitive) throws java.io.IOException
Load a word list from a file path.- Parameters:
resourcepath
- File object to loadcase_sensitive
- if dictionary is loaded with case or not.- Returns:
- a Set object containing distinct dictionary terms
- Throws:
java.io.IOException
- if load fails
-
getFileDescription
public static java.lang.String getFileDescription(java.lang.String url)
Get a plain language name of the type of file. E.g., document, image, spreadsheet, web page. Rather than the MIME type technical descriptor.- Parameters:
url
- item to describe- Returns:
- plain language description of the URL
-
isWebURL
public static boolean isWebURL(java.lang.String link)
Check if path or URL is a webpage. This is helpful for looking at found URLs in unstructured data.- Parameters:
link
- a URL- Returns:
- true if link looks like a URL (ie., if it starts with http: or https:)
-
isJSONGzip
public static boolean isJSONGzip(java.lang.String path)
Tell if the file is JSON/Gzip- Parameters:
path
- input file path- Returns:
- true if is file ends with json.gz or contains json and ends with .gz
-
-